Skip to main content
Back

Genomics: Genetics from a Whole-Genome Perspective (Chapter 16) – Study Notes

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Genomics: Genetics from a Whole-Genome Perspective

Introduction to Genomics

Genomics is the study of the structure, function, evolution, and mapping of entire genomes. It provides insights into the organization, content, and function of genetic material in various organisms, with a focus on large-scale, comparative, and functional analyses.

Genome Size and Gene Content

Variation in Genome Size

  • Eukaryote genomes are generally larger and contain more genes than those of eubacteria and archaea.

  • Multicellular organisms tend to have larger genomes and more genes, but there is extreme variation in genome size, especially among plants due to polyploidy.

  • Gene number for multicellular eukaryotes ranges from ~10,000 to ~100,000, with higher numbers often due to polyploidy.

  • Most genome size variation is not due to differences in gene number, but rather to repetitive elements and non-coding DNA.

Table of sequenced genomes and gene counts Genes vs. genome size in multicellular eukaryotes

Genes and mRNAs

  • Genes are composed of exons (coding regions) and introns (non-coding regions removed during RNA splicing).

  • Exons are retained in mature mRNA, while introns are removed.

  • Untranslated regions (UTRs) are parts of exons that are not translated into protein.

  • The coding sequence (CDS) is the portion of exons that is translated into protein.

CDS and mRNA totals Drosophila vs. Human CDS totals Drosophila vs. Human

Genome Composition

  • Large amounts of genomic DNA are not transcribed into mRNA (e.g., ~67% in humans, ~40% in Drosophila).

  • CDS makes up a very small proportion of the genome and mRNA.

  • Introns can be very large, especially in primates, while plants generally have small introns.

  • Most genome size variation is due to repetitive elements, not differences in CDS.

CDS and mRNA totals Drosophila vs. Human (repeated)

Reasons for Genome Size Variation

  • Repetitive elements, transposons, and transposon fossils (junk DNA) are the major contributors to genome size variation.

  • Polyploidy and genome duplication also increase genome size.

  • Introns vary in size between species.

  • Differences in gene number are a minor reason for genome size variation.

DNA Sequencing Methods

Parameters of Sequencing Methods

  • Cost per base and per run

  • Clonal (requires purified DNA) vs. parallel (can sequence mixtures)

  • Accuracy, read length, and reads per run

Common methods include Sanger sequencing, Illumina SBS, Pacific Biosciences, and Oxford Nanopore, each with different strengths and applications.

Genome Sequencing and Assembly

  • Sequencing reads are much shorter than chromosomes, making assembly challenging, especially in repetitive regions.

  • Paired-end sequencing and long-read technologies (e.g., PacBio, Oxford Nanopore) have improved assembly quality.

  • Genome assembly involves organizing reads into contigs and scaffolds.

Sequencing by primer walking and shotgun sequencing Assembly ambiguity due to repeats Paired-end sequencing and scaffold assembly

Genome Assembly Metrics

  • L50: The number of scaffolds such that 50% of the genome is contained in scaffolds of this length or longer.

  • N50: The length of the scaffold at the L50 rank.

  • Lower L50 and higher N50 indicate better assembly quality.

Cuscuta campestris v0.1 genome assembly (poor) Cuscuta campestris v0.32 genome assembly (improved)

Genome Annotation

Annotation Process

  • Annotation identifies genes, exons, introns, UTRs, promoters, enhancers, and repeats.

  • Experimental methods (e.g., RNA-seq) and computational predictions are used.

  • Challenges include alternative splicing, alternative start sites, and uncertain CDS boundaries.

Gene annotation using cDNA and consensus sequences

Functional Annotation

  • Assigns function to genes based on experiments or homology.

  • Many proteins remain unclassified or of unknown function.

Pie chart of Arabidopsis gene functions

Genome Organization and Synteny

Genome Organization

  • Genes and repetitive elements are interspersed throughout the genome.

  • Gene density and intron number vary widely among species and chromosomes.

Gene density and intron number across species Genes and repeats in the human genome

Synteny

  • Synteny is the conserved order of genes among related species.

  • It is more easily detected in animals than in plants due to fewer polyploidy events in animals.

Synteny in Saccharomyces species

Evolution of Genomes

How New Genes Are Born

  • Exon shuffling, retrotransposition, and horizontal gene transfer contribute to new gene formation.

  • Transposable elements (TEs) can be co-opted into gene function.

  • Gene duplication can lead to pseudogenes, subfunctionalization, or neofunctionalization.

Gene duplication and fates of duplicated genes Gene duplication outcomes

Genome Variation and Copy Number Variation

  • Heterozygosity, copy number variants, and aneuploidies complicate genome assembly and analysis.

  • Mispairing of tandem repeats during meiosis can generate copy number variation.

Copy number variation by mispairing of repeats Frequency and size of sequence variants

Comparative Genomics and Phylogenetic Footprinting

Phylogenetic Footprinting

  • Comparing syntenic regions across species reveals conserved sequences, especially in coding regions (CDS).

  • Phylogenetic footprinting is effective in animals but less so in plants due to polyploidy and genome fractionation.

Conservation in CDS and non-coding regions Conservation among primate genomes Genome duplication and loss of microsynteny

Applications of Comparative Genomics

  • Identification of regulatory elements (e.g., enhancers) by conserved non-coding sequences.

  • Understanding evolutionary relationships and genome evolution.

Synteny between human and mouse chromosomes

Additional info: These notes integrate and expand upon the provided materials, including definitions, examples, and explanations of key genomics concepts, as well as relevant images and tables to reinforce understanding.

Pearson Logo

Study Prep