Chapter 16 part 2

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Genomics & Transposition

Genome Annotation and Genomic Analysis

This study guide covers the principles and methods of genome annotation and genomic analysis, focusing on how genomes are sequenced, annotated, and analyzed to understand gene function, regulation, and evolution. These topics are central to modern genetics and genomics.

How to Sequence and Analyze a Genome

Overview of Genome Analysis Steps

Whole Genome Shotgun Sequencing: The process of randomly breaking up DNA sequences into small pieces, sequencing them, and then reassembling the sequences using computational methods.
Experimental Annotation: Using laboratory experiments (such as cDNA sequencing) to identify genes and their functions.
Computational Annotation: Using bioinformatics tools to predict genes, regulatory elements, and other functional regions in the genome.
Functional Experimentation: Testing gene function through experiments such as gene knockouts or overexpression studies.
Evolutionary Analysis: Comparing genomes across species to identify conserved elements and infer evolutionary relationships.

Genome Annotation

Creating cDNA from mRNA

cDNA (complementary DNA): DNA synthesized from a messenger RNA (mRNA) template using the enzyme reverse transcriptase.
Steps in cDNA Synthesis:
1. Isolate mRNA from cells.
2. Add oligo dT primers to bind the poly-A tail of mRNA.
3. Synthesize the first strand of cDNA using reverse transcriptase.
4. Partially degrade the mRNA using RNase H.
5. Synthesize the second strand of cDNA using DNA polymerase and remaining mRNA fragments as primers.
Application: The resulting cDNA can be sequenced to identify expressed genes and their exon-intron structure.

Using Transcript Sequencing to Annotate Genomes

Expressed Sequence Tags (ESTs): Short sub-sequences of cDNA used to identify gene transcripts.
Full-Length cDNA: Provides complete information about the exon-intron structure of genes.
Annotation Process: Align ESTs and full-length cDNA sequences to the genomic DNA to define exon-intron boundaries, start and stop codons, and untranslated regions (UTRs).
Splice Site Consensus Sequences: Conserved sequences at exon-intron boundaries (e.g., GT-AG rule in eukaryotes).

Computational Annotation and Predicting Open Reading Frames (ORFs)

Open Reading Frame (ORF): A sequence of DNA that could potentially encode a protein, beginning with a start codon and ending with a stop codon.
Computational Prediction: Algorithms scan the genome for ORFs in all six reading frames (three per DNA strand) and predict likely protein-coding regions.
Example: Highlighted regions in a nucleotide sequence that correspond to possible genes based on uninterrupted stretches of codons.

Identifying Functional Non-coding Sequences Using Evolutionary Conservation

Conserved Non-coding Sequences (CNS): Non-coding regions that are highly conserved across species, indicating functional importance (e.g., regulatory elements).
Comparative Genomics: Comparing sequences from different species to identify conserved exons, introns, and regulatory elements.
Example: CNS in the intron of the LMBR1 gene regulates the SHH gene; mutations can cause polydactyly in mice and humans.

Annotating a Newly Sequenced Genome

Protein Coding Sequences: Easiest to annotate due to multiple sources of information (ORFs, transcription data, evolutionary conservation).
Transcribed Non-coding RNAs: Identified by transcription and conservation, but harder to annotate than protein-coding genes.
Transcriptional Regulatory Sequences (e.g., Enhancers): Hardest to annotate due to limited information, often identified by evolutionary conservation.

Genomic Analysis

Genome Size, Gene Number, and Organization

Genome Size vs. Gene Number: Organisms with vastly different genome sizes can have similar numbers of genes.
Gene Density: Varies widely among species; for example, Escherichia coli has high gene density, while Homo sapiens has lower gene density and more introns per gene.

Species	Genome Size (Mb)	# Genes	Genes/Mb	Introns/Gene
Escherichia coli	4.64	4,200	905	0
Saccharomyces cerevisiae	12	6,607	552	0.05
Arabidopsis thaliana	126	27,428	200	4
Drosophila melanogaster	180	13,937	82	3.2
Homo sapiens	3,101	20,709	6.7	9.0

Additional info: Most human genes have unknown functions.

RNA-Sequencing (RNA-Seq) and Gene Expression Analysis

RNA-Seq: A high-throughput method to measure the expression levels of all genes in the genome by sequencing cDNA derived from mRNA.
Process: cDNA is synthesized from mRNA, sequenced, and aligned to the genome. The number of reads mapping to each gene reflects its expression level.
Application: Allows quantification of gene expression and identification of differentially expressed genes under various conditions.

High-Throughput Gene Expression and Cell Cycle Regulation

Gene Expression Profiling: Measuring expression of thousands of genes simultaneously to study patterns across the cell cycle or other biological processes.
Heat Maps: Visual representations where each row is a gene and colors indicate relative expression levels (e.g., green for higher, red for lower expression relative to a reference point).
Clustering: Genes with similar expression patterns can be grouped, revealing co-regulated genes and potential regulatory sequences.

Evolutionary Genomics

Gene Order and Genome Organization Evolves

Synteny: The conservation of blocks of gene order on chromosomes between different species.
Conserved Synteny: Genes are located on the same chromosome in both species, indicating evolutionary conservation.
Genome Rearrangements: Inversions, translocations, and duplications can alter gene order over evolutionary time.

Gene Duplication and Loss

Gene Duplication: The process by which a region of DNA containing a gene is duplicated, resulting in two copies of that gene in the genome.
Divergence: Duplicated genes can accumulate mutations and evolve new functions or become pseudogenes.
Example: Yeast species show patterns of gene duplication and divergence across their genomes.

Phylogenetic Trees and Gene Relationships

Homologs: Genes related by descent from a common ancestral DNA sequence.
Orthologs: Homologous genes in different species that evolved from a common ancestral gene by speciation.
Paralogs: Homologous genes within a species that arose by gene duplication.
Phylogenetic Trees: Diagrams that depict evolutionary relationships among genes or species.
Example: The globin gene family and the hedgehog gene family illustrate orthologs and paralogs.

Genomic Variation

Types of Genomic Variation within a Species

Single Nucleotide Polymorphisms (SNPs): Variations at a single nucleotide position, occurring approximately once every 1,000 base pairs in humans.
Copy Number Variants (CNVs): Large regions of the genome that are duplicated or deleted, leading to variation in the number of copies of particular genes.
Chromosomal Variants: Large-scale changes such as trisomy (extra chromosome) or monosomy (missing chromosome).

Surveying Genomic Variation Across Human Populations

Population Genomics: The study of genetic variation within and between populations using genome-wide data.
Types of Variants Surveyed: Insertions, deletions, copy number variants, inversions, and reference alleles.
Application: Understanding human evolution, migration, and disease susceptibility.