Recombinant DNA Technology, DNA Sequencing, and Genomic Analysis

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Recombinant DNA Technology and Genetically Modified Organisms (GMOs)

Introduction to Recombinant DNA and GMOs

Recombinant DNA technology enables the combination of DNA from different sources to create genetically modified organisms (GMOs). This technology is foundational in modern genetics, biotechnology, and medicine.

Recombinant DNA: DNA molecules formed by laboratory methods of genetic recombination to bring together genetic material from multiple sources.
Genetically Modified Organisms (GMOs): Organisms whose genetic material has been altered using recombinant DNA technology.
Applications: Agriculture (GMO crops), medicine (insulin production), research (gene function studies).

How are GMO plants made?

DNA Amplification and Analysis

Polymerase Chain Reaction (PCR)

PCR is a technique used to amplify specific DNA segments, making millions of copies from a small initial sample. It is essential for genetic analysis, cloning, and sequencing.

Key Steps: Denaturation, annealing, and extension.
Applications: Genetic testing, forensics, cloning, and sequencing.

Gel Electrophoresis

Gel electrophoresis separates DNA, RNA, or proteins based on size and charge, allowing visualization and analysis of nucleic acids.

Principle: DNA fragments move through a gel matrix under an electric field; smaller fragments migrate faster.
Applications: DNA fingerprinting, checking PCR products, sequencing analysis.

DNA Sequencing Technologies

Overview of DNA Sequencing

DNA sequencing determines the precise order of nucleotides in a DNA molecule. Sequencing technologies have evolved through three generations, each with distinct features and applications.

Goal: Identify the complete sequence of nucleotide bases (A, T, C, G) in a DNA sample.
Challenge: No machine can sequence an entire genome in one piece; DNA must be fragmented and sequenced in parts.

First Generation: Sanger Sequencing

Sanger sequencing, also known as chain-termination sequencing, was the first widely used method for DNA sequencing. It uses dideoxynucleotides (ddNTPs) to terminate DNA synthesis at specific bases.

Principle: Incorporation of ddNTPs during DNA synthesis terminates elongation, producing fragments of varying lengths.
Detection: Fragments are separated by size using gel electrophoresis, and the sequence is read from the pattern of bands.
Limitations: Low throughput, suitable for sequencing single genes or small DNA fragments.

Original Sanger sequencing gel and band pattern Structural difference between dNTP and ddNTP

Key Steps in Sanger Sequencing

DNA is amplified and denatured.
Mixture of dNTPs and fluorescently-labeled ddNTPs is added.
Chain termination occurs at each base where a ddNTP is incorporated.
Fragments are separated by capillary gel electrophoresis.
Laser excitation and detection produce a chromatogram for sequence reading.

PCR with fluorescent, chain-terminating ddNTPs Size separation by capillary gel electrophoresis Laser excitation and detection by sequencing machine

Second Generation: Next Generation Sequencing (NGS)

NGS technologies, such as Illumina sequencing, allow massively parallel sequencing of millions of short DNA fragments, greatly increasing throughput and reducing cost per base.

Principle: Sequencing by synthesis, where each nucleotide addition is detected in real time.
Features: Short reads, high throughput, cost-effective, suitable for whole-genome and transcriptome sequencing.
Applications: Genome sequencing, RNA-seq, exome sequencing, metagenomics.

Massively parallel DNA sequencing Illumina sequencing steps

Third Generation: Single-Molecule Sequencing

Third-generation sequencing technologies, such as PacBio and Oxford Nanopore, sequence single DNA molecules in real time, producing much longer reads than previous methods.

Principle: Direct sequencing of single DNA molecules without amplification.
Features: Long reads (up to tens of thousands of bases), real-time data, ability to resolve complex genomic regions.
Applications: De novo genome assembly, structural variant detection, epigenetic modification analysis.

Single molecule DNA sequencing - Nanopore PacBio SMRT sequencing workflow

Comparison of Sequencing Technologies

Generation	Technology	Read Length	Throughput	Key Features
First	Sanger	500-1,000 bp	Low	Accurate, single gene
Second	Illumina (NGS)	50-500 bp	High	Massively parallel, short reads
Third	PacBio, Nanopore	10,000+ bp	Very High	Long reads, real-time

Genomic Analysis and Applications

What is Genomics?

Genomics is the study of the complete set of DNA (genome) in an organism. It encompasses the structure, function, evolution, and mapping of genomes.

Structure/Mapping: Determining the DNA sequence and location of genes.
Function: Understanding the roles of genomic elements and gene regulation.
Functional Annotation: Assigning biological functions to genes and regulatory elements.

Genome Sequencing and Assembly

Whole Genome Sequencing (WGS) involves sequencing the entire genome and assembling the sequence reads into a complete genome.

DNA Libraries: Collections of DNA fragments representing the genome, prepared for sequencing.
Genomic Libraries: Contain all DNA, including coding and noncoding regions.
cDNA Libraries: Contain only expressed genes (mRNA-derived).
Assembly: Overlapping sequence reads are merged to form contigs and scaffolds, which are then mapped to chromosomes.

The Human Genome Project (HGP)

The HGP was an international effort to sequence and map all human genes. It revolutionized genetics and enabled comparative genomics and personalized medicine.

Timeline: Draft released in 2000, declared complete in 2003, final gapless assembly in 2022.
Methods: BAC/YAC cloning, Sanger sequencing, later replaced by NGS.
Findings: ~20,000 protein-coding genes, 99.9% similarity among humans, identification of SNPs and CNVs.

Genome Assembly Process

Contigs: Overlapping sequence reads assembled into continuous sequences.
Scaffolds: Groups of contigs ordered and oriented using additional data (e.g., genetic maps).
Reference Genome: Assembled from pooled genomes of multiple individuals.

Genetic vs. Physical Mapping

Type	Basis	Unit	Resolution
Genetic Map	Recombination frequency	centimorgan (cM)	Low
Physical Map	Physical distance	base pairs (bp, kb, Mb)	High

Features of the Human Genome

Genome Size: ~3.1 billion nucleotides
Protein-Coding Genes: ~20,000 (2% of genome)
Genetic Diversity: SNPs, CNVs
ENCODE Project: Catalogs functional elements in the genome

Multi-Omics Technologies

Overview of Omics

Multi-omics integrates data from genomics, transcriptomics, epigenomics, proteomics, and metabolomics to provide a comprehensive view of biological systems.

Genomics: DNA sequence analysis
Transcriptomics: RNA expression analysis
Epigenomics: DNA/histone modifications
Proteomics: Protein expression and modification
Metabolomics: Small molecule/metabolite analysis

Comparative Genomics

Comparative genomics compares genome sequences across species to identify conserved and unique features, gene functions, and evolutionary relationships.

Applications: Disease gene identification, evolutionary studies, adaptation research.
Example: Neanderthal genome sequencing revealed 1–4% of non-African human DNA is inherited from Neanderthals.

Metagenomics

Metagenomics involves sequencing DNA from entire microbial communities in natural environments, bypassing the need for culturing.

Applications: Discovery of new species, understanding microbial diversity, health and disease studies.
Example: Human Microbiome Project sequenced genomes of 600–1000 human-associated microbes.

Functional Genomics

Functional genomics aims to determine the function of genes and regulatory elements using genome-wide approaches.

Transcriptome: All RNA molecules transcribed from the genome.
Epigenome: All chemical modifications to DNA and histones.
Proteome: All proteins encoded by the genome.
Methods: RNA-seq, microarrays, ChIP-seq, ATAC-seq.

Transcriptomics

Transcriptomics studies gene expression at the RNA level, both qualitatively and quantitatively.

Bulk RNA-Seq: Measures average gene expression in a population of cells.
Single-Cell RNA-Seq: Resolves gene expression in individual cells, revealing tissue heterogeneity.
Applications: Cancer diagnostics, developmental biology, disease research.

Epigenomics

Epigenomics analyzes genome-wide epigenetic marks, such as DNA methylation and histone modifications, which regulate gene expression without altering DNA sequence.

Methods: Whole genome bisulfite sequencing (WGBS), ChIP-seq, ATAC-seq.
Applications: Cancer research, developmental biology, environmental studies.

Proteomics

Proteomics is the large-scale study of proteins, including their expression, structure, and function.

Methods: LC-MS/MS, Western blotting, ELISA.
Applications: Disease biomarker discovery, cancer research, functional annotation.

Summary Table: Multi-Omics Technologies

Omics Field	Analyte	Key Methods	Applications
Genomics	DNA	WGS, NGS	Gene discovery, disease genetics
Transcriptomics	RNA	RNA-seq	Gene expression, diagnostics
Epigenomics	DNA/histone modifications	WGBS, ChIP-seq	Gene regulation, cancer
Proteomics	Proteins	LC-MS/MS	Biomarker discovery
Metabolomics	Metabolites	MS, NMR	Metabolic profiling