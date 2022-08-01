Hi this video we're gonna be talking about sequencing the genome. So before we can study the genome and what all the different functions of all these different DNA pieces are. We have to be able to know the sequence of it. And so I'm going to go over a brief overview of just how sequencing works. Obviously there's different techniques with different minute details but this is just a general overview. So sequencing genomes uses a few main steps that are common to many of the different ways that sequencing occurs. So the first thing is that you have genomic D. N. A. Or you have the majority of genomic DNA. You have a bunch of D. N. A. You want to sequence it. The first thing you have to do is you have to process it and how you process it is actually that you chop it up into a bunch of pieces. And so these have to be random. They have to be overlapping which mean is that not every piece is unique. Some pieces have the same sequence as other pieces but they're overlapping. And that if you have this little piece of D. N. A, some of it will overlap here. Maybe another piece will overlap here. Something else will go like this and overlap here here and here and so all these different pieces have to be overlapping and we'll figure out why in a minute. But first how do you chop up the D. N. A. You can chop up the D. N. A. Using a special type of protein called a restriction enzyme. And these are proteins that chop the D. N. A. They chop usually there's a bunch of them and all of them have a specific sequence or two that they actually will. You know chop the D. N. A. Up. And so you can use combinations of restriction enzymes to get these overlapping segments. Um And chop the entire D. N. A. And these small fragments. So these fragments are given a special name. That name is called a read. Um and so these reeds can vary depending on which how you chop it up and which restriction enzymes you use can vary between 100 and 5000 base pairs long just generally on average. And so reads are super important and we'll talk about those in a second. But those are the overlapping fragments. So here we have D. N. A. This blue and pink you can see there's a sequence here. This is a restriction enzyme that comes in and it chops here and it chops here. So now we have these fragments of D. N. A. That exists they have do have these overlapping segments. Um Which can be useful but mainly what you need to know is that then you generate fragments of D. N. A. And you do this for the entire genome and you generate millions. Um If not even more than that fragments. So then you have all these fragments you have to sequence them. So there's many different ways that this happens one that I'm going to talk about that's mentioned in your book is called pyro sequencing. Um So what happens empire sequencing is you take each read you attach it to a bead and you amplify it. It means that you make multiple copies of that read. So you have multiple copies of that sequence. Then when you have multiple copies of it that means that you have enough to actually be able to check the signal from it. Because if you just have one copy it's going to be whatever signal you're using is going to be really faint. So if you have multiple copies you can really amplify that signal. So the signal that is using pirate sequencing is actually light. And how this is done is you have a machine and like you said you've attached the sequence to a bead. It's attached to some kind of molecule sitting on a plate in a machine. And this machine will actually take each of the nucleotides, A T. C. And G. And run them individually one at a time across this plate where all your sequences are Now these are special nucleotides and they contain a special molecule on them. So when that nucleotide binding will release that molecule and that molecule is called a pyro phosphate. And when it releases it interacts with other chemicals in there and that releases that converts it to a light signal. So let's say you have a you have a sequence and it's all teas here. Right. I mean this is not really going to happen but if you did and the machine puts in an A. That they will bind because it's complementary and when it does it releases a molecule that gives off a light signal. And so because you're doing this in a machine, there's a camera that camera detects the light signal. And because these nucleotides are passing by one at a time, it knows which nucleotide cause the light signal. And it will um say okay well this is the complementary sequence. So here's an example of what this print off of this might look like each of these peaks represents a light light signal. So you can see there's lots of Gs. They've been running some nucleotides over and over and I realized that you know, it's more complicated than what I'm making it. Which is why the X. Access isn't just a T. C. And G. But generally you can see that, you know right here, this G resulted in a light signal. And so that G is going to be complementary to the actual sequence. So we know the sequence here is C. And you can do this over and over and over again throughout the whole sequence however long it is 100 base pairs, 5000 base pairs. And eventually the computer will spit out what the sequence is now. Like I said, this is one way of doing this. There's a lot of different ways, shotgun sequencing um sort of more newer techniques that do this, But this is really the one that's highlighted in your book. So when you have the sequence, you know the sequence of each of these reeds. Remember you probably have like millions of these reeds. You use computer software to overlap the sequence. Remember when we originally designed the sequencing step, we did overlapping reads. And so you use a software to figure out where these sections are for every single read. And so what the computer dies is it finds those overlapping segments and it says okay, these, you know, this is the sequence here and this is the sequence on either side of it. And so that software continues to go and read each segment until it finally connects all the overlapping segments. And this is called sequence assembling. So this is slowly taking each individual read finding where it overlaps with all the rest of the reads and forming it into one sequence which is a consensus sequence. Now we may have gone over consensus sequences before it's differed from conserved sequence. So a conserved sequence is something that is exact between species. But census sequence doesn't have to be exact, which is important, right? Because if we're sequencing for instance, the human genome and we take my my genetic my genetic material to sequence that's not necessarily going to be completely representative of the human genome. Not everyone is a clone of me. Not everyone has blonde hair, has my eye color is my hike. So there's individual differences between where you get the genetic material, say for me and what other members of the species might look like and their genetic material. So it has to be a consensus sequence because this is close but there might be single nucleotides that are different between me and you and other people. And so the individual differences prevent a single sequence a my sequence from truly representing the entire human genome. So another thing that this requires is generally multiple reads of each base pair. So an example of this is say if you read something that there's been tenfold coverage Of the genome, that means that there are that each base pair is represented in at least 10 individual reads. So 10 individual fragments. So that even makes the number of reads even more because you have to have so many reads um covering the whole genome. So this looks like this. It actually looks a lot more complicated than this because you're dealing with millions and billions of reads. But this is exactly what it looks like. So you have so let's see what this is. So we have the red parts or things that we know these are kind of overlapping regions and the blue part is the unknown sequence. So you get these red parts here and you find out, okay, well where are these overlapping And we can say, well this represents this part of the genome. And now we have this whole sequence that we can compare and create a consensus sequence so that when we finally get the full genome, which is this, we know that each base pair has been represented multiple times. We know that these overlapping segments are located in the proper locations and that we can construct the entire genome based on the number of these reeds. So that, like I said, that's an overview of sequencing. There are many different technologies that do this in slightly different ways but that's generally how they all do it even though some of those minor details might be different. So with that let's not turn the page.

