15. Genomes and Genomics
Bioinformatics
1
concept
Bioinformatics
8m
Play a video:
Was this helpful?
Hi in this video, we're gonna be talking about bio informatics. So bio informatics is going to be the study of the information found within the genome. And so what kind of information if you took a gas? What kind of information does the genome hold? Right. So it's gonna hold things information about genes about RNA binding sites, non coding RNA is regulation sites, all this genetic information. And so annotation is the process of marking these functional elements in the genome. And generally this is done through kind of online software. There's these big databases that scientists have developed that you can just go on and look at a specific sequence of D. N. A. And figure out, you know, are there regulatory sites here is this a gene? This is where a protein binds etcetera etcetera etcetera. And this is what it kind of looks like for one of these. Um you can see that there these context up here. So we know these sequences here, We can say that there's some kind of decay at this region, potentially a coding region. And all of these different colors represent something different about the gene. You don't need to know what these mean. We used to go on, you'd see something like this, where there's different markers, different colors, all sort of representing what is in this region right here of the genome. And so using that, So bioinformatics is a great tool to figure out what parts of the genome are functional parts that are being used for what and so bioinformatics can be used to determine where approaching and coding genes are. And so this is that collection of where the protein coding genes are, what the protein coding genes do is called the proteome inventory of all proteins that are encoded by an organism's genome. And how it does this is it tries to identify open reading frames. So what are open reading frames? These are just sequences that have characteristics of genes. So what are some characteristics of genes that you're probably familiar with already? Right. They have five prime ends. They have three prime sequences. Genes have n tron they have exxons and all of these characteristics splice sites for instance. So all of these characteristics can be put into a software program and that software program can read a whole genome and said here are the potential open reading frames that contain all these different characteristics. Another thing that bioinformatics can do is identify an organism's code on bias. So what is the code on bias? Well so far we've told you that different combinations of cardin's can code for the same amino acid but actually it's not equally distributed in some organisms actually prefer to use one coat on over another. So for instance, fruit flies when they code for the amino acid Sistine have multiple choices. They have two choices here. They can use UGC or they can use you G. You but generally they prefer to use UGC because 73% of all sixteens are coded by this. The UGC and not the U. G. U. And that's called code in bias. And so in regions of the genome that have coded bias we start to say okay this is probably a protein coding region because the these code ons aren't distributed equally. So bioinformatics has the ability to do that to predict these protein and coding regions. So when you have an open reading frame you think it's an open reading frame but you don't actually know until you do further studies. Right? And so how you can confirm whether or not you have an open reading frame is there see DNA sequences and they can be used to sort of confirm horse. And so what is C. D. N. A. Well see DNA comes from M. R. N. A. So you have M. R. N. A. Remember this is messenger R. N. A. And what is messenger RNA used for? Right. This is going to be used to try to translate into a protein. So this is a this is a sequence that is only coding region All the entrants have been removed. This is coding for a protein which means that unless something happens between transcription between the M. RNA and translation into a protein this is going to be expressed meaning that it's really a gene. So if you can isolate the M. R. N. A. Just take out everything else, remove the protein, remove all the D. N. A. And just get this sort of solution of just the M. RNA expressed in us out. You can actually reverse transcribe it into D. N. A. And when you do that through reverse transcriptase which is an enzyme you might be familiar with but it takes RNA and turns it into D. N. A. So when you take M. RNA messenger RNA that's going to be made into a protein and you reverse transcribe it into D. N. A. That is called C. D. N. A. And so C. D. N. A. Has unique characteristics compared to normal D. N. A. Because the introns are removed and this is the exact coding sequence of the C. D. N. A. So if you can isolate a C. D. N. A sequence then you know that the orf that you found the open reading frame is actually a gene encoding for a protein. And so there are these huge like collection of sequences called expressed sequence tags. These are short see DNA sequences um and there's large data sets of them. Usually you collect a ton at a time like every M. RNA that's expressed in the sale of the time. You can turn it into um C. D. N. A. And you get these expressed sequence tags that say these are the genes being expressed at this certain time and you can determine what genes are being expressed where the boundaries are. And super super important to confirm whether or not those open reading frames are. In fact genes. So here's an example. There's not a great bioinformatics. It's hard to get pictures of but here's an example of an open reading frame. There's a start code on here. There's a transcription start site. Eventually there's going to be a stock code on way down here. Um and all of these different characteristics will tell the computer this is likely an open reading frame. So bioinformatics can do other things too. It can predict DNA binding sites or protein binding site, protein DNA binding sites. So um again through computer software it'll search through a genome and says, you know, predicted. It'll look for predicted sequences. So promoters sometimes have similar sequences, little look for splice sites etcetera etcetera etcetera and say, okay, well these sequences are consensus sequences or they're conserved. So they're likely promoter or transcription start site or an enhancer or splice site or whatever you're looking for. And then finally bioinformatics can also be used to study evolution and DNA similarity. So a common search that's done is called a blast search. And this is actually on the N C. B. I. Web site and you can just google N C B I blast and it'll come up and if you have a if you have a sequence and you have no idea what the sequence is. Let me blast it. You can blast a nuclear title or protein sequence. It'll spit out all these different organisms with similar sequences and all these different proteins with similar sequences. To give you an idea of which organism it comes from and what that function of that specific gene sequences. So here is an example. So you have this human sequence. It's here and this is protein. I know it's protein because these are the short codes for each amino acid. And you can see that it looks through my stub falcon worms, sea urchins etcetera etcetera etcetera. And it says how similar are these between organisms? To be honest, between a sea urchin and a human, this is probably a very conserved um uh jean. Just because there's not a ton of changes but they there are changes there and you can use these big surges to look through, you know, how are these genes similar between different organisms? So bioinformatics, looking at information content of genes where the genes are, what are protein coding where things are binding? How is everything evolved? All this sort of information content of jeans. Um so with that let's not move on.
2
Problem
ProblemWhich of the following is NOT a piece of information that bioinformatics can analyze?
A
Location of DNA-Protein binding sites
B
Identifying all the proteins expressed in a skin cell
C
A list of all introns in the genome
D
The function of one gene
3
Problem
ProblemWhich of the following can be used to identify an open-reading frame?
A
cDNA sequences
B
Introns
C
Enhancer locations
D
Exons
Additional resources for Bioinformatics
PRACTICE PROBLEMS AND ACTIVITIES (11)
- Go to the National Institute for Child Health and Human Development (http://www.nichd.nih.gov), locate the sea...
- What are community-based genetic screening programs? What is the intent of such screening programs? Why are me...
- What is bioinformatics, and why is this discipline essential for studying genomes? Provide two examples of bio...
- You are designing algorithms for the bioinformatic prediction of gene sequences. How might algorithms differ f...
- Do you think it is important that participation in community-based genetic screening be entirely voluntary? Wh...
- BLAST searches and related applications are essential for analyzing gene and protein sequences. Define BLAST, ...
- Go to http://blast.ncbi.nlm.nih.gov/Blast.cgi and follow the links to nucleotide blast. Type in the sequence b...
- In the course of the Drosophila melanogaster genome project, the following genomic DNA sequences were obtained...
- DNA footprint protection (described in Research Technique 8.1) is a method that determines whether proteins bi...
- DNA footprint protection (described in Research Technique 8.1) is a method that determines whether proteins bi...
- DNA footprint protection (described in Research Technique 8.1) is a method that determines whether proteins bi...