BackGenome Evolution: Structure, Variation, and Mechanisms
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Genome Evolution
Introduction to Genome Evolution
Genome evolution refers to the changes in genome structure, content, and organization over time, driven by various genetic mechanisms. Understanding genome evolution is essential for interpreting genetic diversity, organismal complexity, and evolutionary relationships.
Genome content includes coding sequences (genes), non-coding regions (introns, repetitive sequences), and other elements.
Comparative genomics allows us to study how genomes evolve by comparing content between species.
Key mechanisms: mutation, genetic drift, natural selection, duplication, and divergence.
Genome Content and Organization
Components of the Genome
The genome is composed of various types of DNA sequences, each contributing to its structure and function.
Coding sequences: Regions that encode proteins.
Non-coding sequences: Includes introns, repetitive DNA, and other intergenic regions.
Repetitive DNA: Can be highly or moderately repetitive, such as transposons and tandem repeats.
Example: Human Genome Composition
Component | Percentage |
|---|---|
Transposons | 45% |
Introns | 24% |
Other intergenic DNA | 22% |
Large duplications | 5% |
Simple repeats | 3% |
Exons (coding) | 1% |
Genomic Complexity and Gene Number
Minimum Gene Number and Organism Complexity
The minimum number of genes required for an organism increases with its biological complexity.
Simple organisms (e.g., parasitic bacteria): ~500 genes
Free-living bacteria: ~1200 genes
Unicellular eukaryotes: ~5000 genes
Multicellular eukaryotes: ~13,000 genes
Higher plants: ~25,000 genes
Mammals: ~25,000 genes
Genomic Complexity
Genomic complexity is determined by the presence of genes for essential functions, cell compartments, multicellularity, development, and specialized systems (e.g., nervous and immune systems).
Genome Size and Gene Number Variation
Prokaryotes
In prokaryotes, genome size and gene number are proportional due to the high percentage of coding DNA.
Genome size: 490 kbp – 9,106 kbp
Gene number: 480 – 6,700
Average: ~950 genes per Mb
Table: Genome Size and Gene Number in Prokaryotes
Organism | Genome Size (kb) | Genes |
|---|---|---|
H. influenzae | 1,830 | 1,700 |
E. coli | 4,639 | 4,288 |
B. subtilis | 4,214 | 4,100 |
M. tuberculosis | 4,411 | 4,000 |
S. coelicolor | 8,667 | 7,825 |
Eukaryotes
Eukaryotic genomes are much larger and more variable, but gene number does not correlate with organismal complexity.
Genome size: 12 Mb (yeast) to billions of bp (plants)
Gene number: 10,000 – 60,000
Largest known plant genome: Paris japonica (149 billion bp)
Table: Eukaryotic Gene Number and Genome Size
Organism | Genome Size (Mb) | Genes |
|---|---|---|
S. cerevisiae | 12 | 6,000 |
D. melanogaster | 180 | 13,600 |
Arabidopsis | 125 | 25,000 |
Mouse | 2,800 | 30,000 |
Human | 3,000 | 25,000 |
Gene Structure and Genome Organization
Interrupted Genes and mRNA Types
Gene prediction is complicated by the presence of introns and the concept of interrupted genes, especially in eukaryotes.
Monocistronic mRNA: Encodes one polypeptide (common in eukaryotes).
Polycistronic mRNA: Encodes multiple polypeptides (common in prokaryotes).
Prokaryotic genomes are colinear; eukaryotic genomes are interrupted by introns.
Non-coding Regions and Repetitive DNA
Types of Repetitive DNA
Repetitive DNA is classified based on sequence length and copy number.
Non-repetitive DNA: One copy per genome.
Moderately repetitive DNA: Short sequences, repeated 10–1000 times (e.g., transposons, rRNA genes).
Highly repetitive DNA: Very short sequences (<100 bp), repeated thousands of times (e.g., satellite DNA).
Table: Genome Size vs. Repetitive DNA
Organism | Highly Repetitive | Moderately Repetitive | Non-Repetitive |
|---|---|---|---|
Bacteria | Low | Low | High |
Nematode | Low | Moderate | Moderate |
Insect | Low | Moderate | Moderate |
Mouse | Moderate | High | Low |
Amphibian | High | High | Low |
Plant | High | High | Low |
Essential vs Non-essential Genes
Gene Essentiality and Redundancy
Essential genes are required for survival; non-essential genes may be dispensable or redundant.
Loss of essential genes is lethal or causes sterility.
Redundancy: Multiple genes with similar functions can compensate for each other.
Redundant genes are non-essential if other genes can fulfill their function.
Unique Genes vs Gene Families
Gene Families and Homology
Gene families are groups of related genes that evolved from a common ancestor through duplication events.
Members share sequence homology but may have different functions.
Number of gene families and family members increases with organismal complexity.
Table: Gene Families Across Species
Species | Unique Genes | Families with 2–4 Members | Families with >4 Members |
|---|---|---|---|
H. influenzae | 89% | 10% | 1% |
S. cerevisiae | 72% | 19% | 9% |
D. melanogaster | 72% | 14% | 14% |
C. elegans | 55% | 20% | 26% |
A. thaliana | 35% | 24% | 41% |
Mechanisms of Genome Evolution
Mutation and Allele Frequency
Mutations are changes in DNA sequence that create genetic variation. Each mutation can be considered a new allele.
Allele frequency: Proportion of a specific allele in a population.
Allele frequency can increase (fixation) or decrease (loss) over generations.
Polymorphism: Allele frequency >1% in a population.
Genetic Drift
Genetic drift is the random change in allele frequencies, especially significant in small populations.
Slow and non-directional.
Can lead to fixation or loss of alleles.
Impact decreases as population size increases.
Natural Selection
Natural selection is the non-random change in allele frequencies due to differences in fitness (viability and reproduction).
Fast and directional.
Advantageous alleles increase in frequency; disadvantageous alleles decrease.
Dominant alleles manifest in both homozygotes and heterozygotes; recessive alleles are selected only in homozygotes.
Gene Duplication and Divergence
Mechanisms of Gene Duplication
Gene duplication can occur through errors in replication, recombination, or repair. Duplicated genes can evolve new functions or become nonfunctional (pseudogenes).
Duplication can be small (single gene) or large (chromosomal region).
Divergence: Accumulation of mutations in duplicated genes leads to new functions.
Gene clusters: Multiple duplications can create clusters of related genes.
Homologs, Paralogs, and Orthologs
Gene relationships are defined by their evolutionary origin:
Homologs: Genes with sequence similarity, including both paralogs and orthologs.
Paralogs: Genes separated by duplication within the same species or family.
Orthologs: Genes separated by speciation, found in different species.
Functional Significance of Gene Families
Gene Family Expression and Function
Gene families can have divergent expression patterns and functions, often regulated during development or in response to environmental changes.
Example: Hemoglobin gene clusters are expressed at different developmental stages and have different oxygen affinities.
Summary Table: Key Concepts in Genome Evolution
Concept | Description |
|---|---|
Genome Content | Coding, non-coding, repetitive DNA |
Gene Number | Varies with complexity, not always with genome size |
Mutation | Creates genetic variation |
Genetic Drift | Random allele frequency changes |
Natural Selection | Directional allele frequency changes |
Gene Duplication | Source of new genes and functions |
Gene Families | Groups of homologous genes |
Key Equations
Allele Frequency (Hardy-Weinberg): where and are allele frequencies.
Genotype Frequency: , ,
Additional info:
Gene clusters and pseudogenes are important in evolutionary innovation.
Genome evolution is a dynamic process involving both random and selective forces.