Skip to main content
Back

Genomics, Sequencing Technologies, and Haplotype Analysis: Study Notes

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Genomics and the Human Genome Project

Types of Genomics Studies

Genomics is the study of the structure, function, evolution, and mapping of genomes. Different types of genomics studies include:

  • Whole Genome Sequencing (WGS): Sequencing the entire DNA content of an organism.

  • Whole Exome Sequencing (WES): Sequencing only the protein-coding regions (exons) of the genome.

  • Transcriptome Analysis (RNA-seq): Sequencing all RNA transcripts present in a cell or tissue at a given time.

Example: WGS is used to identify all genetic variants in a patient with a rare disease.

The Human Genome Project (HGP)

The Human Genome Project was an international effort to sequence the entire human genome. It used shotgun genome sequencing, which involves breaking DNA into small fragments, sequencing them, and assembling the sequences using computational methods.

  • Genome size: 3.2 billion base pairs

  • Protein-coding genes: ~25,000 (about 2% of the genome)

  • Gene average size: 3,000 bp

  • Exon average size: 150 bp

  • Introns: Can be >100 kb, much larger than exons

  • Repetitive DNA: Large portion of the genome

  • Genetic similarity: Two unrelated people share 99.5% of their DNA sequence

Example: The HGP revealed that only a small fraction of the genome codes for proteins, with many regions consisting of non-coding or repetitive DNA.

Shotgun Sequencing and Contig Assembly

Shotgun sequencing involves randomly breaking up DNA sequences into small pieces, sequencing them, and then assembling the overlapping fragments into a continuous sequence (contig).

  • Contig: A set of overlapping DNA segments that together represent a consensus region of DNA.

  • Genomic library: A collection of DNA fragments cloned into vectors (e.g., plasmids) for sequencing.

Why clone contigs into plasmid vectors?

  • To provide a universal primer binding site for sequencing

  • To allow reproducible sequencing of each fragment

  • To facilitate assembly of overlapping sequences

DNA Sequencing Technologies

Sanger Sequencing

Sanger sequencing, also known as chain-termination sequencing, is a method for determining the nucleotide sequence of DNA. It was the primary technology used in the HGP.

  • Relies on incorporation of dideoxynucleotides (ddNTPs) to terminate DNA synthesis at specific bases

  • Produces DNA fragments of varying lengths, which are separated by electrophoresis

  • The sequence is read from the pattern of terminated fragments

Limitation: Sanger sequencing is low-throughput and best for sequencing small DNA fragments.

Next-Generation Sequencing (NGS)

NGS, or sequencing by synthesis, allows for massively parallel sequencing of millions of DNA fragments.

  • DNA fragments (contigs) are attached to beads or a solid surface and amplified

  • Nucleotides are added one at a time; incorporation emits a signal detected by a computer

  • Short read lengths (300–700 bp) require computational assembly

  • Challenges: Difficulties with repetitive regions, insertions/deletions, and computational reassembly

Example: Illumina sequencing is a widely used NGS platform.

Third-Generation Sequencing

Third-generation sequencing, such as nanopore sequencing, reads single DNA molecules without amplification or cloning.

  • DNA is passed through a nanopore; changes in electrical current indicate the identity of each base

  • Allows for much longer read lengths (tens of kilobases)

  • Can detect modified bases (e.g., methylated CpG)

  • Reduces the need for sequence assembly

Advantages over NGS:

  • No need for shotgun cloning

  • Longer reads simplify assembly

  • Direct detection of base modifications

Gene Annotation and Functional Genomics

Gene Annotation

Gene annotation is the process of identifying the locations and functions of genes within a genome.

  • Distinguishes between genes and pseudogenes (non-functional gene copies)

  • Determines gene structure: exons, introns, regulatory elements

  • Describes gene expression patterns and functional roles

Functional genomics combines molecular biology, cell biology, and biochemistry to study gene function.

Finding Genes in the Genome

Genes are identified by characteristic sequence features:

  • Promoters (e.g., TATA box, CAAT box)

  • Exons and introns

  • Splice sites (GT/AG rule)

  • Transcription start and termination sites

  • Polyadenylation signals

Example: Computational gene prediction algorithms scan for these features to annotate genes.

The -Omics Revolution

Major -Omics Fields

  • Genomics: Study of entire genomes

  • Transcriptomics: Study of all RNA transcripts (the transcriptome)

  • Metagenomics: Study of genetic material recovered directly from environmental samples

  • Other fields: proteomics, pharmacogenomics, metabolomics, glycomics, epigenomics, toxicogenomics, interactomics

All -omics fields rely heavily on bioinformatics for data analysis.

Bioinformatics and Genomic Data Analysis

Bioinformatics

Bioinformatics combines biology and computer science to analyze and interpret biological data, especially large datasets from sequencing projects.

  • BLAST (Basic Local Alignment Search Tool): Compares nucleotide or protein sequences to sequence databases and calculates statistical significance

  • OMIM (Online Mendelian Inheritance in Man): Database of human genes and genetic disorders

Example: BLAST is used to identify homologous genes in different species.

Genotyping and Haplotype Analysis

Pre-Genome Sequencing Genotyping Methods

  • Cytogenetics: Chromosome staining to visualize chromosomal abnormalities

  • Single gene sequencing: Using Sanger sequencing for specific genes

  • Haplotype mapping: Identifying patterns of genetic variation using RFLP or DNA arrays

Haplotypes and Haplogroups

  • Haplotype: A group of alleles or SNPs inherited together from a single parent

  • Haplogroup: A group of similar haplotypes that share a common ancestor

Haplotype Mapping and DNA Markers

Haplotype mapping uses DNA markers, such as SNPs, to create detailed maps of genetic variation.

  • One SNP is present every ~1,000 bp in human DNA (over 13 million SNPs)

  • Tag SNP: Representative SNPs in a region of the genome with high linkage disequilibrium

  • Used in genetic screening and association studies

Array-Based Genotyping (DNA Microarrays)

DNA microarrays are used to genotype thousands of SNPs simultaneously.

  • Solid support contains ssDNA probes complementary to tag SNPs

  • Sample DNA is fragmented, labeled, and hybridized to the array

  • Fluorescence indicates the presence of specific SNPs

  • Computer scans and maps fluorescence across the chip

Step

Description

Probe attachment

ssDNA probes fixed to chip at known locations

Sample preparation

Genomic DNA fragmented and labeled

Hybridization

Labeled DNA binds to complementary probes

Detection

Fluorescence measured to determine SNP presence

Example: 23andMe uses a proprietary DNA array to genotype 600,000 SNPs for direct-to-consumer genetic testing.

Summary Table: Sequencing Technologies

Technology

Key Features

Read Length

Cloning Required?

Sanger Sequencing

Chain-termination, low-throughput

~700 bp

Yes

Next-Generation Sequencing

Massively parallel, short reads

300–700 bp

No

Third-Generation Sequencing

Single-molecule, long reads, direct detection

10,000+ bp

No

Key Terms and Definitions

  • Contig: Overlapping DNA segments that together represent a consensus region

  • Pseudogene: A gene sequence that resembles a gene but is non-functional

  • Gene annotation: Process of identifying gene locations and functions

  • Bioinformatics: Application of computational tools to analyze biological data

  • Haplotype: Set of DNA variations inherited together

  • Tag SNP: Representative SNP used to infer the presence of other linked SNPs

Equations and Concepts

  • Genetic similarity between individuals:

  • Linkage disequilibrium (LD): Non-random association of alleles at different loci

Additional info: Equations added for academic completeness.

Pearson Logo

Study Prep