Skip to main content
Back

Genomics and Genome Evolution: Structure, Function, and Comparative Analysis

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Genomics and Genome Evolution

Overview

Genomics is the study of entire genomes, including their structure, function, evolution, and mapping. Advances in DNA sequencing and bioinformatics have revolutionized our understanding of genetic information, genome organization, and evolutionary relationships among species.

Genome Structure and Content

Protein-Coding vs. Noncoding DNA

  • Human Genome Composition: About 98.5% of the human genome is noncoding DNA; only ~1.5% codes for proteins or functional RNAs.

  • Gene Number Variation: Gene numbers vary widely among species (e.g., Escherichia coli: ~4,400 genes; humans: ~20,000; corn: ~32,000).

  • Gene Density: Prokaryotes have high gene density; eukaryotes have lower gene density due to abundant noncoding DNA.

Types of Noncoding DNA

  • Introns: Noncoding sequences within genes, removed during RNA processing.

  • Regulatory Sequences: Control gene expression (e.g., promoters, enhancers).

  • Repetitive DNA: Includes transposable elements, simple sequence repeats, and segmental duplications.

  • Pseudogenes: Nonfunctional gene copies arising from duplication and mutation.

Repetitive DNA and Transposable Elements

  • Transposable Elements: DNA sequences that move within the genome, classified as transposons (DNA intermediates) and retrotransposons (RNA intermediates).

  • Major Families: Alu elements (~10% of human genome), LINE-1 (L1) elements (~17%).

  • Simple Sequence DNA: Short tandem repeats (STRs) and other repeats, important for genetic profiling and chromosome structure.

Genome Sequencing and Bioinformatics

The Human Genome Project and Sequencing Technologies

  • Human Genome Project (HGP): International effort (1990–2003) to sequence the human genome; final completion in 2022.

  • Sequencing Strategies:

    • Methodical Approach: Ordered DNA fragments based on genetic mapping.

    • Whole-Genome Shotgun Approach: Randomly fragments DNA, sequences pieces, and assembles the genome computationally.

  • Technological Advances: Next-generation sequencing enables rapid, cost-effective sequencing (from $500 million to <$600 per genome).

  • Metagenomics: Sequencing DNA from environmental samples to study mixed microbial communities.

Bioinformatics and Data Analysis

  • Bioinformatics: Integrates computer science, mathematics, and biology to analyze genomic data.

  • Key Resources:

    • GenBank: NCBI's primary DNA sequence database.

    • BLAST: Tool for comparing DNA/protein sequences to identify similarities.

    • Protein Data Bank: Repository for 3D protein structures.

  • Gene Annotation: Identifying genes and predicting their functions using computational and experimental methods (e.g., RNA-seq, CRISPR-Cas9 knockouts).

  • Systems Biology: Integrates genomics, proteomics, and computational modeling to study gene/protein networks and cellular processes.

  • AI and Machine Learning: Automate data analysis, pattern recognition, and prediction in genomics and medical research.

Genome Organization and Evolution

Genome Size, Gene Number, and Density

  • Genome Size: Prokaryotes: 1–6 Mb; Eukaryotes: highly variable (e.g., yeast: 12 Mb; humans: 3,000 Mb; some plants: >100,000 Mb).

  • Gene Number: Prokaryotes: 1,500–7,500; Eukaryotes: 5,000–40,000+.

  • Gene Density: Higher in prokaryotes (e.g., E. coli: ~950 genes/Mb) than in eukaryotes (humans: ~7 genes/Mb).

  • Alternative Splicing: Increases protein diversity; ~90% of human multi-exon genes are alternatively spliced.

  • Post-Translational Modifications: Further diversify polypeptides (e.g., cleavage, glycosylation).

Multigene Families

  • Definition: Groups of related genes with similar sequences and functions.

  • Types:

    • Identical Sequences: Often in tandem clusters (e.g., rRNA genes).

    • Nonidentical Sequences: E.g., alpha- and beta-globin gene families, expressed at different developmental stages.

  • Pseudogenes: Nonfunctional gene copies within families, evidence of gene duplication and divergence.

Genome Evolution Mechanisms

  • Mutation: Fundamental source of genetic variation.

  • Gene and Genome Duplication: Polyploidy (whole-genome duplication), segmental duplications, and unequal crossing over increase genetic material for evolution.

  • Chromosomal Rearrangements: Fusions, inversions, and translocations can alter genome structure and drive speciation (e.g., human chromosome 2 fusion).

  • Exon Duplication and Shuffling: Create new proteins by rearranging coding regions (exons) within or between genes.

  • Transposable Elements: Promote recombination, gene disruption, and exon movement, contributing to genome plasticity.

Table: Comparison of Prokaryotic and Eukaryotic Genomes

Feature

Prokaryotes

Eukaryotes

Genome Size

1–6 Mb

12 Mb – >100,000 Mb

Gene Number

1,500–7,500

5,000–40,000+

Gene Density

High (~950 genes/Mb)

Low (~7 genes/Mb in humans)

Noncoding DNA

Minimal

Extensive (introns, repetitive DNA)

Introns

Rare

Common

Multigene Families

Few

Many

Comparative Genomics and Evolutionary Insights

Comparing Genomes Across Species

  • Evolutionary Relationships: Sequence similarity reflects common ancestry; more similar genomes indicate more recent divergence.

  • Conserved Genes: Essential genes are often highly conserved across domains (e.g., Bacteria, Archaea, Eukarya).

  • Human and Chimpanzee Genomes: Differ by ~1% in single nucleotide substitutions; additional differences from insertions, deletions, and duplications.

  • Genetic Markers: SNPs, CNVs, and STRs are used to study human evolution, population history, and disease associations.

Developmental Genes and Evo-Devo

  • Homeotic Genes: Encode transcription factors with a conserved homeobox sequence; specify body segment identity.

  • Hox Genes: Homologous across animals; regulate development and pattern formation.

  • Gene Regulation: Differences in regulatory sequences, not gene sequences, often underlie morphological diversity.

  • FOXP2 Gene: Critical for speech and vocalization; mutations cause speech disorders in humans and affect vocalization in other vertebrates.

Table: Examples of Genome Evolution Mechanisms

Mechanism

Description

Example

Polyploidy

Whole-genome duplication

Common in flowering plants

Gene Duplication

Extra copies of genes via unequal crossing over

Alpha- and beta-globin gene families

Exon Shuffling

Mixing of exons between genes

Tissue plasminogen activator (TPA) gene

Transposable Elements

Movement of DNA sequences within genome

Alu and LINE-1 elements in humans

Applications and Implications

Medical and Research Applications

  • Cancer Genomics: Identifying tumor-specific mutations for targeted therapies.

  • Personalized Medicine: Using genomic data to tailor treatments to individual genetic profiles.

  • Noninvasive Prenatal Testing: Detecting chromosomal abnormalities in fetal DNA.

  • Ethical Considerations: Privacy, data security, and potential for genetic discrimination.

Key Projects and Databases

  • ENCODE: Systematic identification of functional elements in the human genome.

  • Roadmap Epigenomics: Mapping epigenetic features across tissues.

  • Cancer Genome Atlas: Systems biology approach to cancer genomics.

  • GOLD Database: Catalogs genome sequencing projects and their medical relevance.

Key Equations and Concepts

  • Percent Identity (Amino Acid Sequence):

  • Gene Expression Flow (Central Dogma):

Summary Table: Human Genome Composition (Approximate)

Component

Percentage of Genome

Description

Protein-coding genes

~1.5%

Exons of genes encoding proteins

RNA-coding genes (rRNA, tRNA)

<1%

Genes for functional RNAs

Introns and regulatory sequences

~24%

Noncoding regions within and around genes

Repetitive DNA (transposable elements, etc.)

~44%

Transposons, retrotransposons, simple sequence repeats

Other noncoding DNA

~30%

Pseudogenes, unique noncoding sequences

Key Takeaways

  • The vast majority of the human genome is noncoding DNA, much of which has regulatory or structural roles.

  • Genome size, gene number, and gene density vary widely across organisms and are not directly correlated with organismal complexity.

  • Genome evolution is driven by mutation, duplication, rearrangement, and the activity of transposable elements.

  • Comparative genomics reveals evolutionary relationships and the molecular basis of development and diversity.

  • Bioinformatics and systems biology are essential for interpreting large-scale genomic data and understanding complex biological systems.

Additional info: Some percentages and values were inferred based on standard textbook data where the original notes had placeholders or missing values.

Pearson Logo

Study Prep