BackGenomics: Structure, Function, and Evolution of Genomes
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Genomics: Structure, Function, and Evolution of Genomes
Introduction to Genomics
Genomics is the comprehensive study of the structure, function, evolution, and mapping of genomes. It encompasses the sequencing and analysis of genomes from a wide variety of organisms, providing insights into gene content, organization, and evolutionary relationships. Genomics is divided into several major subfields, including structural genomics, functional genomics, and evolutionary genomics.
Structural Genomics
Sequencing Methods
Structural genomics focuses on sequencing entire genomes and annotating the sequences within them. The development of high-throughput sequencing technologies has revolutionized this field, making genome sequencing faster and more affordable.
Whole Genome Shotgun (WGS) Sequencing: The preferred method for sequencing genomes today, involving the random fragmentation of DNA and subsequent assembly of the sequences.
Next-Generation Sequencing (NGS) Platforms: Modern sequencing is performed on platforms such as Illumina, PacBio, and Oxford Nanopore, each varying in read length, throughput, and cost.
Sequencing Cost: The cost to sequence a human genome has dropped dramatically, from nearly $3 billion during the Human Genome Project to less than $1,000 today.

Human Genome Project
The Human Genome Project (HGP) was a landmark international scientific effort to sequence the entire human genome. Initiated in 1990, it provided the first draft in 2000 and a near-complete sequence in subsequent years.
Key Outcomes: Improved estimates of gene number, identification of structural and sequence diversity, and evidence of historic interbreeding with archaic humans.
Applications: Human genome sequencing is now widely used in medicine, evolutionary biology, and population genetics.

Diversity of Sequenced Genomes
Genome sequencing has expanded to a wide range of organisms, from bacteria to plants and animals. Projects like the Vertebrate Genomes Project aim to sequence all known vertebrate species, providing a comprehensive resource for comparative genomics.
Genome Size Variation: Genome sizes vary dramatically among organisms, with some plants and amphibians having extremely large genomes.
Sequencing Output: Modern sequencing centers can process thousands of genomes per year.

Next Generation Sequencing Platforms
Illumina Sequencing
Illumina sequencing is the most widely used NGS platform, producing short but highly accurate reads at a low cost per base. It is suitable for large-scale genome projects and clinical applications.
Read Length: 75-300 base pairs (bp)
Output: Up to 8 trillion bases per run
Accuracy: 99.9%
Pooled/Paired-End Sequencing: Sequencing both ends of DNA fragments improves assembly and accuracy.

Pacific Biosciences (PacBio) Sequencing
PacBio platforms produce long sequence reads (average 10,000-15,000 bp, up to 100,000 bp), which are valuable for resolving complex genomic regions and structural variants. The cost per base is higher than Illumina, but accuracy and throughput are now comparable.
Oxford Nanopore Sequencing
Oxford Nanopore sequencing offers the longest read lengths, limited only by the length of the DNA molecule. It is highly flexible and portable, with real-time sequencing capabilities, though accuracy is slightly lower than other platforms (about 99%).
Genome Annotation
Definition and Methods
Genome annotation is the process of identifying genes, regulatory sequences, and other functional elements within a genome sequence. Annotation can be experimental or computational.
Experimental Annotation: Uses cDNA libraries and transcriptome sequencing to identify exons and expressed genes.
Computational Annotation: Searches for open reading frames (ORFs), conserved motifs, and structural features such as centromeres and telomeres.
Functional Annotation: Assigns biological functions to genes by comparing sequences to known genes in databases.
Evolutionary and Comparative Genomics
Comparative Genomics
Comparative genomics, also known as phylogenomics, involves comparing genomes within and among species to understand evolutionary relationships and gene function.
Interspecific Comparisons: Identify conserved sequences and synteny (conservation of gene order), providing evidence for common ancestry.
Intraspecific Comparisons: Identify genetic polymorphisms within populations, informing studies of genetic diversity and adaptation.
Gene Content Analysis and Evolution
Genes can arise and disappear through various mechanisms, including gene duplication, exon shuffling, reverse transcription, and horizontal gene transfer (HGT). HGT is especially important in prokaryotes, complicating phylogenetic analysis.
Gene Duplication: The most common mechanism for new gene formation in eukaryotes.
Horizontal Gene Transfer: Transfer of genes between unrelated species, common in bacteria and archaea.
Phylogenetics Using Whole Genomes
Whole-genome data allow for robust phylogenetic analysis, using hundreds to thousands of independent loci to infer evolutionary relationships. In prokaryotes, HGT creates a network-like evolutionary history, while in eukaryotes, independent loci provide multiple lines of evidence for species divergence.
Functional Genomics
Genome-Wide Association Studies (GWAS)
GWAS are used to identify associations between genetic variants and phenotypic traits or diseases. They involve genotyping large populations at thousands to millions of loci and analyzing correlations with traits of interest.
Manhattan Plots: Visualize GWAS results, with each dot representing a SNP and its association with a trait.
Odds Ratio (OR): Quantifies the increase in risk associated with a specific genetic variant.
Microarrays and Transcriptomics
Microarrays and transcriptome sequencing (RNA-seq) are used to study gene expression patterns under different conditions. Microarrays use DNA probes to detect mRNA, while RNA-seq provides quantitative and comprehensive analysis of all transcripts present in a cell or tissue.
Transcriptome: The complete set of mRNA transcripts in a cell or organism, reflecting gene expression at a given time.
Research Examples in Genomics
Great Ape Genomes
Comparative genomics of great apes, including humans, chimpanzees, and orangutans, reveals structural genomic variation and gene expression differences that may underlie species-specific traits.
Minimal Genome Project (Synthia)
Researchers have synthesized minimal genomes to determine the essential genetic requirements for life. For example, the JCV-syn3.0 genome contains only 473 genes, with the function of 149 genes still unknown.
Summary: Key Points
Genomics integrates sequencing, annotation, and comparative analysis to understand genome structure, function, and evolution.
Technological advances have made genome sequencing rapid and affordable, enabling large-scale projects across diverse organisms.
Functional genomics tools such as GWAS, microarrays, and transcriptomics provide insights into gene function and regulation.
Comparative genomics and phylogenomics reveal evolutionary relationships and mechanisms of gene and genome evolution.