BackBioinformatics and Systems Biology: Analyzing Genomes and Their Functions
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Bioinformatics: Tools and Resources for Genome Analysis
Centralized Resources for Analyzing Genome Sequences
Bioinformatics is the application of computational tools to the storage, organization, and analysis of biological data, especially large-scale DNA and protein sequences. The Human Genome Project catalyzed the development of centralized databases and analytical software, making genomic data accessible to researchers worldwide.
GenBank: The NCBI's primary DNA sequence database, containing over 240 million fragments of genomic DNA as of 2023.
NCBI (National Center for Biotechnology Information): Provides access to databases, software, and information on genomics and related topics.
Other major databases include the European Molecular Biology Laboratory, DNA Data Bank of Japan, and BGI in China.
Specialized databases and software exist for focused research, such as cancer genomics.
Making these resources freely available accelerates research and discovery in genomics and related fields.
Key Bioinformatics Tools
BLAST (Basic Local Alignment Search Tool): Compares a DNA or protein sequence to all sequences in GenBank to identify regions of similarity, aiding in gene and protein identification.
Other programs allow for comparison of predicted protein sequences, identification of conserved domains, and visualization of three-dimensional protein structures.
Software can align and compare collections of sequences, constructing evolutionary trees based on sequence relationships.

Figure: Example of BLAST search results and 3D protein domain visualization, highlighting conserved WD40 domains in a muskmelon protein.
Protein Structure Databases
Protein Data Bank (PDB): A global repository of experimentally determined three-dimensional protein structures, allowing researchers to visualize and analyze protein architecture.
Understanding the Functions of Protein-Coding Genes
Gene Identification and Functional Analysis
After identifying new genes through DNA sequencing, researchers use bioinformatics to predict their functions by comparing their sequences to known genes and proteins from other organisms.
Protein sequences are often more conserved than DNA sequences due to redundancy in the genetic code.
Functional domains (e.g., WD40 domains) can be identified by sequence similarity, providing clues to protein function.
Experimental confirmation involves techniques such as RNA-seq to verify gene expression.
Unknown gene functions may require biochemical studies (e.g., determining 3D structure, binding sites) and functional studies (e.g., gene knockout using CRISPR-Cas9).

Figure: Example of gene mapping on human chromosome 17, illustrating the identification of gene locations relevant to functional genomics.
Systems-Level Analysis of Genes and Gene Expression
Genomics and the ENCODE Project
Bioinformatics enables the study of entire sets of genes and their interactions, leading to insights into genome organization, gene regulation, development, and evolution.
ENCODE (Encyclopedia of DNA Elements): A large-scale project to identify all functional elements in the human genome, including protein-coding genes, noncoding RNAs, and regulatory sequences (enhancers, promoters).
ENCODE also characterized epigenetic features such as DNA and histone modifications and chromatin structure.
Findings: About 75% of the human genome is transcribed, but less than 2% codes for proteins. At least 80% of the genome has a biochemical function.
Model organisms (e.g., Caenorhabditis elegans, Drosophila melanogaster) are used to test the function of DNA elements.
Epigenomics and Disease
Roadmap Epigenomics Project: Characterized the epigenome of hundreds of human cell types and tissues, including stem cells and diseased tissues (e.g., cancer, neurodegenerative disorders).
Epigenomic profiling can identify the tissue of origin for metastatic cancers, aiding diagnosis and treatment.
Systems Biology and Proteomics
Proteomics
Proteomics is the large-scale study of all proteins (the proteome) expressed by a cell or organism, including their abundance, modifications, and interactions.
Proteins carry out most cellular activities; understanding their expression and interaction networks is crucial for understanding cell function.
Systems biology integrates genomics and proteomics to model the dynamic behavior of biological systems.
Application to Medicine: Cancer Genomics
Systems biology approaches are transforming cancer research and treatment.
The Cancer Genome Atlas: Analyzed gene mutations and expression patterns in various cancers, identifying new therapeutic targets.
High-throughput sequencing and gene expression analysis (e.g., microarrays, RNA-seq) allow for personalized medicine, tailoring treatments to individual genetic profiles.
Gene expression profiling distinguishes cancer subtypes, improving treatment strategies (e.g., breast cancer subtypes).

Figure: A gene microarray chip, used to analyze gene expression patterns in medical research and diagnostics.
Summary Table: Major Bioinformatics Resources and Their Functions
Resource | Main Function | Example Application |
|---|---|---|
GenBank | DNA sequence database | Gene identification, evolutionary studies |
BLAST | Sequence similarity search | Finding homologous genes/proteins |
Protein Data Bank (PDB) | 3D protein structures | Protein function and drug design |
ENCODE | Functional genome annotation | Identifying regulatory elements |
Roadmap Epigenomics | Epigenome mapping | Cancer diagnosis, tissue identification |
Key Terms and Concepts
Bioinformatics: The use of computational tools to analyze biological data.
Genomics: The study of whole genomes, including gene mapping, sequencing, and analysis.
Proteomics: The study of the entire set of proteins produced by an organism.
Systems Biology: An integrated approach to understanding the interactions and dynamics of biological systems.
Epigenomics: The study of epigenetic modifications across the genome.
Microarray: A technology for measuring the expression levels of many genes simultaneously.
Concept Check
What role does the Internet play in current genomics and proteomics research? It provides access to centralized databases, analytical tools, and up-to-date information, enabling rapid data sharing and collaboration among researchers worldwide.
Explain the advantage of the systems biology approach to studying cancer versus studying a single gene at a time. Systems biology analyzes the interactions among many genes and proteins, providing a comprehensive understanding of cancer mechanisms and identifying new therapeutic targets that would be missed by focusing on single genes.
The ENCODE pilot project found that at least 75% of the genome is transcribed into RNAs, far more than could be accounted for by protein-coding genes. Suggest some roles that these RNAs might play. Noncoding RNAs may regulate gene expression, modify chromatin structure, participate in RNA processing, or have structural and catalytic roles in the cell.