BackMicrobial Systems Biology and Genome Evolution: Study Notes
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Microbial Systems Biology and Genomics
Introduction to Systems Biology
Systems biology is an integrative field that seeks to understand the complex interactions within biological systems, moving beyond the study of individual pathways to analyze networks of genes, proteins, and metabolites. This approach provides a holistic view of how organisms respond to their environment and adapt to changing conditions.
Systems biology uses data from genomics, transcriptomics, proteomics, and metabolomics to map and model biological processes.
Traditional microbiology focused on single pathways, but systems biology reveals the dynamic, interconnected nature of cellular functions.
Genomics is foundational, enabling the use of other 'omics' approaches and providing targets for drug and vaccine development.
Applications include monitoring disease outbreaks, discovering uncultured organisms, and identifying virulence factors.

Genomics: Sequencing, Assembly, and Annotation
Genome Sequencing
Genome sequencing determines the precise order of nucleotides in DNA or RNA. Modern sequencing technologies have revolutionized microbiology by enabling rapid and comprehensive analysis of microbial genomes.
Sequencing: Determining the nucleotide sequence of DNA fragments.
Genome assembly: Piecing together short DNA sequences into longer, continuous sequences (contigs and scaffolds).
Genome annotation: Identifying genes and functional elements within the assembled genome.
Bioinformatics: Computational analysis of sequence data to predict gene function and structure.
Sanger Sequencing and Next-Generation Sequencing
The Sanger method, developed by Fred Sanger, was the first widely used DNA sequencing technique. It relies on chain-terminating dideoxynucleotides (ddNTPs) to generate DNA fragments of varying lengths, which are then separated and analyzed to determine the sequence.
Sanger sequencing is limited to ~800 nucleotides per reaction and requires "primer walking" for larger genomes.
Next-generation sequencing (NGS) technologies, such as pyrosequencing, nanopore, and Illumina sequencing, allow for massively parallel sequencing without prior knowledge of the sequence.
NGS produces large numbers of short reads that must be computationally assembled.

Comparison of Sequencing Methods
Sequencing technologies are classified by generation, each with distinct methods and features.
Generation | Method | Features |
|---|---|---|
First | Sanger dideoxy method | Read length: 700–900 bases; used for Human Genome Project |
Second | Pyrosequencing, Illumina, SOLiD | Shorter reads (35–700 bases); high throughput; used for large-scale projects |
Third | Helicos, PacBio | Single-molecule sequencing; longer reads (up to 15 kb) |
Fourth | Oxford Nanopore | Ultra-long reads (up to 900 kb); portable devices |

Genome Assembly and Annotation
Genome assembly involves aligning overlapping short reads to reconstruct the original DNA sequence. Annotation identifies open reading frames (ORFs) and other functional elements.
Computers merge overlapping sequences into contigs, which are further organized into scaffolds.
Annotation is often the bottleneck, as it requires identifying genes, regulatory regions, and non-coding RNAs.
Prokaryotic genomes are typically compact, with ORFs separated by short regulatory regions.

Comparative Genomics: Genome Size and Gene Content
Genome Size and ORF Content
Comparative genomics examines similarities and differences in genome size and gene content across organisms. In prokaryotes, there is a strong correlation between genome size and the number of ORFs.
Each megabase pair (Mbp) of prokaryotic DNA encodes approximately 1,000 ORFs.
Gene content increases proportionally with genome size in prokaryotes.
Eukaryotic genomes contain large amounts of noncoding DNA, so gene density is lower.

Gene Content and Functional Categories
The distribution of gene functions varies with genome size. Larger genomes have more genes for regulatory and environmental adaptation functions.
Core cellular processes (DNA replication, transcription, translation) show minor variation in gene number.
Genes for signal transduction and transcriptional regulation increase with genome size, enabling metabolic versatility.

Organelle Genomes
Mitochondria and Chloroplasts
Mitochondria and chloroplasts are organelles with their own genomes, derived from endosymbiotic bacteria. Their genomes encode essential components for energy metabolism and gene expression.
Chloroplast genomes: Circular DNA, 120–160 kbp, encode rRNAs, tRNAs, and proteins for photosynthesis and gene expression.
Mitochondrial genomes: Encode proteins for oxidative phosphorylation, rRNAs, and tRNAs; often smaller and may be circular or linear.
Many organelle proteins are encoded by nuclear genes, reflecting gene transfer during evolution.
Genome Evolution: Gene Families, Duplications, and Horizontal Gene Transfer
Gene Families and Duplications
Gene families are groups of homologous genes that arise through gene duplication events. Duplications allow one gene copy to retain its original function while the other evolves new functions.
Homologs: Genes related by evolutionary ancestry.
Paralogs: Genes within the same organism that arose by duplication.
Orthologs: Genes in different organisms that originated from a common ancestor.
Gene duplication is a major driver of evolutionary innovation.

Horizontal Gene Transfer (HGT) and the Mobilome
Horizontal gene transfer is the movement of genetic material between organisms, bypassing traditional inheritance. It is a key mechanism for microbial evolution and adaptation.
HGT occurs via transformation, transduction, and conjugation.
Mobile genetic elements (the mobilome) include plasmids, prophages, integrons, insertion sequences, and transposons.
HGT can introduce new metabolic capabilities or virulence factors.

Core Genome, Pan Genome, and Chromosomal Islands
Core Genome and Pan Genome
The core genome consists of genes shared by all strains of a species, while the pan genome includes the core plus all accessory genes found in some but not all strains. This concept explains the genetic diversity within microbial species.
HGT and mobile elements contribute to the expansion of the pan genome.
Strains of the same species can differ greatly in gene content and capabilities.

Chromosomal Islands and Pathogenicity Islands
Chromosomal islands are large DNA segments in the chromosome that contain clusters of genes for specialized functions, such as virulence, symbiosis, or pollutant degradation. Pathogenicity islands are a type of chromosomal island that encode virulence factors.
Chromosomal islands often have different GC content and codon usage, suggesting horizontal acquisition.
They may be flanked by repeat sequences and can carry integrase genes.
Pathogenicity islands increase the genome size and virulence of pathogenic strains.
