Skip to main content
Back

Genomics: Structure, Function, and Evolution of Genomes

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Genomics: Structure, Function, and Evolution of Genomes

Introduction to Genomics

Genomics is the comprehensive study of the structure, function, evolution, and mapping of genomes. It encompasses the sequencing and analysis of genomes from a wide variety of organisms, providing insights into gene content, organization, and evolutionary relationships. Genomics is divided into several major subfields, including structural genomics, functional genomics, and evolutionary genomics.

Structural Genomics

Sequencing Methods

Structural genomics focuses on sequencing entire genomes and annotating the sequences within them. The development of high-throughput sequencing technologies has revolutionized this field, making genome sequencing faster and more affordable.

  • Whole Genome Shotgun (WGS) Sequencing: The preferred method for sequencing genomes today, involving the random fragmentation of DNA and subsequent assembly of the sequences.

  • Next-Generation Sequencing (NGS) Platforms: Modern sequencing is performed on platforms such as Illumina, PacBio, and Oxford Nanopore, each varying in read length, throughput, and cost.

  • Sequencing Cost: The cost to sequence a human genome has dropped dramatically, from nearly $3 billion during the Human Genome Project to less than $1,000 today.

Comparison of hierarchical and whole-genome shotgun sequencing approaches DNA sequencing cost per genome over time

Human Genome Project

The Human Genome Project (HGP) was a landmark international scientific effort to sequence the entire human genome. Initiated in 1990, it provided the first draft in 2000 and a near-complete sequence in subsequent years.

  • Key Outcomes: Improved estimates of gene number, identification of structural and sequence diversity, and evidence of historic interbreeding with archaic humans.

  • Applications: Human genome sequencing is now widely used in medicine, evolutionary biology, and population genetics.

The complete sequence of a human genome Wanted poster for Human Genome Project volunteers Gene tally: estimates of protein-coding genes in the human genome

Diversity of Sequenced Genomes

Genome sequencing has expanded to a wide range of organisms, from bacteria to plants and animals. Projects like the Vertebrate Genomes Project aim to sequence all known vertebrate species, providing a comprehensive resource for comparative genomics.

  • Genome Size Variation: Genome sizes vary dramatically among organisms, with some plants and amphibians having extremely large genomes.

  • Sequencing Output: Modern sequencing centers can process thousands of genomes per year.

Examples of sequenced genomes table Loblolly pine (Pinus taeda) Axolotl (Ambystoma mexicanum) The Vertebrate Genomes Project Phases of the Vertebrate Genomes Project

Next Generation Sequencing Platforms

Illumina Sequencing

Illumina sequencing is the most widely used NGS platform, producing short but highly accurate reads at a low cost per base. It is suitable for large-scale genome projects and clinical applications.

  • Read Length: 75-300 base pairs (bp)

  • Output: Up to 8 trillion bases per run

  • Accuracy: 99.9%

  • Pooled/Paired-End Sequencing: Sequencing both ends of DNA fragments improves assembly and accuracy.

Illumina MiSeq sequencing instrument Illumina flow cell Paired-end sequencing

Pacific Biosciences (PacBio) Sequencing

PacBio platforms produce long sequence reads (average 10,000-15,000 bp, up to 100,000 bp), which are valuable for resolving complex genomic regions and structural variants. The cost per base is higher than Illumina, but accuracy and throughput are now comparable.

Oxford Nanopore Sequencing

Oxford Nanopore sequencing offers the longest read lengths, limited only by the length of the DNA molecule. It is highly flexible and portable, with real-time sequencing capabilities, though accuracy is slightly lower than other platforms (about 99%).

Genome Annotation

Definition and Methods

Genome annotation is the process of identifying genes, regulatory sequences, and other functional elements within a genome sequence. Annotation can be experimental or computational.

  • Experimental Annotation: Uses cDNA libraries and transcriptome sequencing to identify exons and expressed genes.

  • Computational Annotation: Searches for open reading frames (ORFs), conserved motifs, and structural features such as centromeres and telomeres.

  • Functional Annotation: Assigns biological functions to genes by comparing sequences to known genes in databases.

Genome annotation example

Evolutionary and Comparative Genomics

Comparative Genomics

Comparative genomics, also known as phylogenomics, involves comparing genomes within and among species to understand evolutionary relationships and gene function.

  • Interspecific Comparisons: Identify conserved sequences and synteny (conservation of gene order), providing evidence for common ancestry.

  • Intraspecific Comparisons: Identify genetic polymorphisms within populations, informing studies of genetic diversity and adaptation.

Comparative genomics and synteny

Gene Content Analysis and Evolution

Genes can arise and disappear through various mechanisms, including gene duplication, exon shuffling, reverse transcription, and horizontal gene transfer (HGT). HGT is especially important in prokaryotes, complicating phylogenetic analysis.

  • Gene Duplication: The most common mechanism for new gene formation in eukaryotes.

  • Horizontal Gene Transfer: Transfer of genes between unrelated species, common in bacteria and archaea.

The net of prokaryotic life due to horizontal gene transfer

Phylogenetics Using Whole Genomes

Whole-genome data allow for robust phylogenetic analysis, using hundreds to thousands of independent loci to infer evolutionary relationships. In prokaryotes, HGT creates a network-like evolutionary history, while in eukaryotes, independent loci provide multiple lines of evidence for species divergence.

Functional Genomics

Genome-Wide Association Studies (GWAS)

GWAS are used to identify associations between genetic variants and phenotypic traits or diseases. They involve genotyping large populations at thousands to millions of loci and analyzing correlations with traits of interest.

  • Manhattan Plots: Visualize GWAS results, with each dot representing a SNP and its association with a trait.

  • Odds Ratio (OR): Quantifies the increase in risk associated with a specific genetic variant.

Manhattan plot for GWAS

Microarrays and Transcriptomics

Microarrays and transcriptome sequencing (RNA-seq) are used to study gene expression patterns under different conditions. Microarrays use DNA probes to detect mRNA, while RNA-seq provides quantitative and comprehensive analysis of all transcripts present in a cell or tissue.

  • Transcriptome: The complete set of mRNA transcripts in a cell or organism, reflecting gene expression at a given time.

Research Examples in Genomics

Great Ape Genomes

Comparative genomics of great apes, including humans, chimpanzees, and orangutans, reveals structural genomic variation and gene expression differences that may underlie species-specific traits.

Minimal Genome Project (Synthia)

Researchers have synthesized minimal genomes to determine the essential genetic requirements for life. For example, the JCV-syn3.0 genome contains only 473 genes, with the function of 149 genes still unknown.

Summary: Key Points

  • Genomics integrates sequencing, annotation, and comparative analysis to understand genome structure, function, and evolution.

  • Technological advances have made genome sequencing rapid and affordable, enabling large-scale projects across diverse organisms.

  • Functional genomics tools such as GWAS, microarrays, and transcriptomics provide insights into gene function and regulation.

  • Comparative genomics and phylogenomics reveal evolutionary relationships and mechanisms of gene and genome evolution.

Pearson Logo

Study Prep