Skip to main content
Back

Genome Evolution: Structure, Variation, and Mechanisms

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Genome Evolution

Introduction to Genome Evolution

Genome evolution refers to the changes in genome structure, content, and organization over time, driven by various genetic mechanisms. Understanding genome evolution is essential for interpreting genetic diversity, organismal complexity, and evolutionary relationships.

  • Genome content includes coding sequences (genes), non-coding regions (introns, repetitive sequences), and other elements.

  • Comparative genomics allows us to study how genomes evolve by comparing content between species.

  • Key mechanisms: mutation, genetic drift, natural selection, duplication, and divergence.

Genome Content and Organization

Components of the Genome

The genome is composed of various types of DNA sequences, each contributing to its structure and function.

  • Coding sequences: Regions that encode proteins.

  • Non-coding sequences: Includes introns, repetitive DNA, and other intergenic regions.

  • Repetitive DNA: Can be highly or moderately repetitive, such as transposons and tandem repeats.

Example: Human Genome Composition

Component

Percentage

Transposons

45%

Introns

24%

Other intergenic DNA

22%

Large duplications

5%

Simple repeats

3%

Exons (coding)

1%

Genomic Complexity and Gene Number

Minimum Gene Number and Organism Complexity

The minimum number of genes required for an organism increases with its biological complexity.

  • Simple organisms (e.g., parasitic bacteria): ~500 genes

  • Free-living bacteria: ~1200 genes

  • Unicellular eukaryotes: ~5000 genes

  • Multicellular eukaryotes: ~13,000 genes

  • Higher plants: ~25,000 genes

  • Mammals: ~25,000 genes

Genomic Complexity

Genomic complexity is determined by the presence of genes for essential functions, cell compartments, multicellularity, development, and specialized systems (e.g., nervous and immune systems).

Genome Size and Gene Number Variation

Prokaryotes

In prokaryotes, genome size and gene number are proportional due to the high percentage of coding DNA.

  • Genome size: 490 kbp – 9,106 kbp

  • Gene number: 480 – 6,700

  • Average: ~950 genes per Mb

Table: Genome Size and Gene Number in Prokaryotes

Organism

Genome Size (kb)

Genes

H. influenzae

1,830

1,700

E. coli

4,639

4,288

B. subtilis

4,214

4,100

M. tuberculosis

4,411

4,000

S. coelicolor

8,667

7,825

Eukaryotes

Eukaryotic genomes are much larger and more variable, but gene number does not correlate with organismal complexity.

  • Genome size: 12 Mb (yeast) to billions of bp (plants)

  • Gene number: 10,000 – 60,000

  • Largest known plant genome: Paris japonica (149 billion bp)

Table: Eukaryotic Gene Number and Genome Size

Organism

Genome Size (Mb)

Genes

S. cerevisiae

12

6,000

D. melanogaster

180

13,600

Arabidopsis

125

25,000

Mouse

2,800

30,000

Human

3,000

25,000

Gene Structure and Genome Organization

Interrupted Genes and mRNA Types

Gene prediction is complicated by the presence of introns and the concept of interrupted genes, especially in eukaryotes.

  • Monocistronic mRNA: Encodes one polypeptide (common in eukaryotes).

  • Polycistronic mRNA: Encodes multiple polypeptides (common in prokaryotes).

  • Prokaryotic genomes are colinear; eukaryotic genomes are interrupted by introns.

Non-coding Regions and Repetitive DNA

Types of Repetitive DNA

Repetitive DNA is classified based on sequence length and copy number.

  • Non-repetitive DNA: One copy per genome.

  • Moderately repetitive DNA: Short sequences, repeated 10–1000 times (e.g., transposons, rRNA genes).

  • Highly repetitive DNA: Very short sequences (<100 bp), repeated thousands of times (e.g., satellite DNA).

Table: Genome Size vs. Repetitive DNA

Organism

Highly Repetitive

Moderately Repetitive

Non-Repetitive

Bacteria

Low

Low

High

Nematode

Low

Moderate

Moderate

Insect

Low

Moderate

Moderate

Mouse

Moderate

High

Low

Amphibian

High

High

Low

Plant

High

High

Low

Essential vs Non-essential Genes

Gene Essentiality and Redundancy

Essential genes are required for survival; non-essential genes may be dispensable or redundant.

  • Loss of essential genes is lethal or causes sterility.

  • Redundancy: Multiple genes with similar functions can compensate for each other.

  • Redundant genes are non-essential if other genes can fulfill their function.

Unique Genes vs Gene Families

Gene Families and Homology

Gene families are groups of related genes that evolved from a common ancestor through duplication events.

  • Members share sequence homology but may have different functions.

  • Number of gene families and family members increases with organismal complexity.

Table: Gene Families Across Species

Species

Unique Genes

Families with 2–4 Members

Families with >4 Members

H. influenzae

89%

10%

1%

S. cerevisiae

72%

19%

9%

D. melanogaster

72%

14%

14%

C. elegans

55%

20%

26%

A. thaliana

35%

24%

41%

Mechanisms of Genome Evolution

Mutation and Allele Frequency

Mutations are changes in DNA sequence that create genetic variation. Each mutation can be considered a new allele.

  • Allele frequency: Proportion of a specific allele in a population.

  • Allele frequency can increase (fixation) or decrease (loss) over generations.

  • Polymorphism: Allele frequency >1% in a population.

Genetic Drift

Genetic drift is the random change in allele frequencies, especially significant in small populations.

  • Slow and non-directional.

  • Can lead to fixation or loss of alleles.

  • Impact decreases as population size increases.

Natural Selection

Natural selection is the non-random change in allele frequencies due to differences in fitness (viability and reproduction).

  • Fast and directional.

  • Advantageous alleles increase in frequency; disadvantageous alleles decrease.

  • Dominant alleles manifest in both homozygotes and heterozygotes; recessive alleles are selected only in homozygotes.

Gene Duplication and Divergence

Mechanisms of Gene Duplication

Gene duplication can occur through errors in replication, recombination, or repair. Duplicated genes can evolve new functions or become nonfunctional (pseudogenes).

  • Duplication can be small (single gene) or large (chromosomal region).

  • Divergence: Accumulation of mutations in duplicated genes leads to new functions.

  • Gene clusters: Multiple duplications can create clusters of related genes.

Homologs, Paralogs, and Orthologs

Gene relationships are defined by their evolutionary origin:

  • Homologs: Genes with sequence similarity, including both paralogs and orthologs.

  • Paralogs: Genes separated by duplication within the same species or family.

  • Orthologs: Genes separated by speciation, found in different species.

Functional Significance of Gene Families

Gene Family Expression and Function

Gene families can have divergent expression patterns and functions, often regulated during development or in response to environmental changes.

  • Example: Hemoglobin gene clusters are expressed at different developmental stages and have different oxygen affinities.

Summary Table: Key Concepts in Genome Evolution

Concept

Description

Genome Content

Coding, non-coding, repetitive DNA

Gene Number

Varies with complexity, not always with genome size

Mutation

Creates genetic variation

Genetic Drift

Random allele frequency changes

Natural Selection

Directional allele frequency changes

Gene Duplication

Source of new genes and functions

Gene Families

Groups of homologous genes

Key Equations

  • Allele Frequency (Hardy-Weinberg): where and are allele frequencies.

  • Genotype Frequency: , ,

Additional info:

  • Gene clusters and pseudogenes are important in evolutionary innovation.

  • Genome evolution is a dynamic process involving both random and selective forces.

Pearson Logo

Study Prep