Skip to main content
Pearson+ LogoPearson+ Logo
Ch. 16 - Genomics: Genetics from a Whole-Genome Perspective
Sanders - Genetic Analysis: An Integrated Approach 3rd Edition
Sanders3rd EditionGenetic Analysis: An Integrated ApproachISBN: 9780135564172Not the one you use?Change textbook
Chapter 16, Problem 15

In the course of the Drosophila melanogaster genome project, the following genomic DNA sequences were obtained. Try to assemble the sequences into a single contig.
5' TTCCAGAACCGGCGAATGAAGCTGAAGAAG 3'
5' GAGCGGCAGATCAAGATCTGGTTCCAGAAC 3'
5' TGATCTGCCGCTCCGTCAGGCATAGCGCGT 3'
5' GGAGAATCGAGATGGCGCACGCGCTATGCC 3'
5' GGAGAATCGAGATGGCGCACGCGCTATGCC 3'
5' CCATCTCGATTCTCCGTCTGCGGGTCAGAT 3'
Go to the URL provided in Problem 14, and using the sequence you have just assembled, perform a blastn search in the 'Nucleotide collection (nr/nt)' database. Does the search produce sequences similar to your assembled sequence, and if so, what are they? Can you tell if your sequence is transcribed, and if it represents protein-coding sequence? Perform a tblastx search, first choosing the 'Nucleotide collection (nr/nt)' database and then limiting the search to human sequences by typing Homo sapiens in the organism box. Are homologous sequences found in the human genome? Annotate the assembled sequence.

Verified step by step guidance
1
Begin by examining each provided DNA sequence carefully, noting their 5' to 3' orientation. Since these are genomic fragments, your goal is to find overlapping regions between the sequences to assemble them into a continuous stretch, called a contig.
Compare the end of one sequence with the beginning of another to identify overlaps. For example, look for a substring at the 3' end of one sequence that matches the 5' start of another sequence. This overlap indicates that these sequences are adjacent in the genome.
Once you identify overlapping sequences, merge them by aligning the overlapping regions, ensuring that the overlapping nucleotides match perfectly. Continue this process iteratively, joining sequences step-by-step until all sequences are assembled into a single contig.
After assembling the contig, use the assembled sequence to perform a blastn search against the 'Nucleotide collection (nr/nt)' database. This will help you find similar sequences and determine if your assembled sequence corresponds to known genomic regions, genes, or other elements.
Next, perform a tblastx search with your assembled sequence, first against the entire nucleotide database and then limiting the search to Homo sapiens sequences. This will translate your nucleotide sequence in all six reading frames and compare it to protein sequences, helping you identify potential protein-coding regions and homologous sequences in humans. Use these results to annotate your assembled sequence, indicating whether it is transcribed and if it likely encodes a protein.

Verified video answer for a similar problem:

This video solution was recommended by our tutors as helpful for the problem above.
Video duration:
2m
Was this helpful?

Key Concepts

Here are the essential concepts you must grasp in order to answer the question correctly.

Sequence Assembly and Contig Formation

Sequence assembly involves aligning and merging overlapping DNA fragments to reconstruct the original sequence, forming a continuous stretch called a contig. Understanding how to identify overlaps and correctly order sequences is essential for genome projects and accurate downstream analysis.
Recommended video:
Guided course
08:41
Sequencing Difficulties

BLAST Searches and Sequence Similarity

BLAST (Basic Local Alignment Search Tool) compares a query sequence against databases to find regions of similarity, indicating homology or functional relationships. Different BLAST programs (blastn, tblastx) serve specific purposes, such as nucleotide-nucleotide or translated nucleotide searches, helping identify related sequences and infer function.
Recommended video:
Guided course
08:41
Sequencing Difficulties

Gene Annotation and Functional Inference

Gene annotation involves identifying features like coding regions, transcriptional activity, and homologous genes within a sequence. By analyzing BLAST results and sequence characteristics, one can determine if a sequence is transcribed, protein-coding, and conserved across species, providing insights into its biological role.
Recommended video:
Guided course
08:26
Functional Genomics
Related Practice
Textbook Question

What is a reference genome? How can it be used to survey genetic variation within a species?

611
views
Textbook Question

The two-hybrid method facilitates the discovery of protein–protein interactions. How does this technique work? Can you think of reasons for obtaining a false-positive result, that is, where the proteins encoded by two clones interact in the two-hybrid system but do not interact in the organism in which they naturally occur? Can you think of reasons you might obtain a false-negative result, in which the two proteins interact in vivo but fail to interact in the two-hybrid system?

404
views
Textbook Question

Go to http://blast.ncbi.nlm.nih.gov/Blast.cgi and follow the links to nucleotide BLAST. Type in the sequence below; it is broken up into codons to make it easier to copy.

5' ATG TTC GTC AAT CAG CAC CTT TGT GGT TCT CAC CTC GTT GAA GCTTTG TAC CTT GTT TGC GGT GAA CGT GGT TTC TTC TAC ACT CCT AAG ACT TAA 3'

As you will note on the BLAST page, there are several options for tailoring your query to obtain the most relevant information. Some are related to which sequences to search in the database. For example, the search can be limited taxonomically (e.g., restricted to mammals) or by the type of sequences in the database (e.g., cDNA or genomic). For our search, we will use the broadest database, the 'Nucleotide collection (nr/nt).' This is the nonredundant (nr) database of all nucleotide data (nt) in GenBank and can be selected in the 'Database' dialogue box. Other parameters can also be adjusted to make the search more or less sensitive to mismatches or gaps. For our purposes, we will use the default setting, which is automatically presented. Press 'BLAST' to search. What can you say about the DNA sequence?

428
views
Textbook Question

Consider the phylogenetic trees below pertaining to three related species (A, B, and C) that share a common ancestor (last common ancestor, or LCA). The lineage leading to species A diverges before the divergence of species B and C.

For gene X, no gene duplications have occurred in any lineage, and each gene X is derived from the ancestral gene X via speciation events. Are genes AX, BX, and CX orthologous, paralogous, or homologous?

388
views
Textbook Question

Consider the phylogenetic trees below pertaining to three related species (A, B, and C) that share a common ancestor (last common ancestor, or LCA). The lineage leading to species A diverges before the divergence of species B and C.

For gene Y, a gene duplication occurred in the lineage leading to A after it diverged from that, leading to B and C. Are genes AY1 and AY2 orthologous or paralogous? Are genes AY1 and BY orthologous or paralogous? Are genes BY and CY orthologous or paralogous?

436
views
Textbook Question

Consider the phylogenetic trees below pertaining to three related species (A, B, and C) that share a common ancestor (last common ancestor, or LCA). The lineage leading to species A diverges before the divergence of species B and C.

For gene Z, gene duplications have occurred in all species. Define orthology and paralogy relationships for the different Z genes.

379
views