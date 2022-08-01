Hello everyone in this lesson. We are going to be learning about the difficulties that come along with entire genome assembly or whole genome assembly. Okay, so the entire genome can be particularly difficult to sequence and it's particularly difficult because we are going to possess some characteristics in the genome that are difficult to track. For example, the majority of our genome, the majority of our D. N. A. Is composed of these repetitive sequences that are just 80 80 80 80 for thousands of base pairs. They don't particularly code for anything. But if we're trying to sequence the entire genome we're going to need to know those sequences and where they belong and how they align with the other complementary sequences. So this can pose a problem because it's difficult to know where a repetitive sequence of DNA begins, where it ends, where it overlaps with other repetitive sequences of DNA. So these are going to be some of these certain genome characteristics that make a genome very difficult to assemble because repetitive DNA sequences are generally much longer than the actual known sequences of DNA or the reeds and they're generally much longer than the coding genome repetitive sequences are very common in our genome. And that can make it difficult to determine where the overlaps begin, where they and where this entire giant string of A's and teas or jesus is actually came from. And the way that we're going to combat this issue is we're going to use paired end reads paired end reads are going to be a technique that we utilize to put these repetitive sequences in the correct location and in the correct alignment alignment is very important and I'll explain that in just a second. So paired end reads are pairs of sequences that are red from opposite ends of the genomic inserts. So basically we have this giant repetitive sequence and then we have these known sequences of DNA on either end of that giant repetitive sequence and pear and reads may span the gap and help determine the sequence between the two contigo. So if we look at this particular diagram here, what I want you guys to know is that the known sequence is going to be represented by the arrows, The unknown sequence? Yeah. Which is usually the repetitive one is going to be represented by the line. And as you guys can see in our key here, it says roughly known length but not known sequence. So we have a general understanding that maybe there's 1000 base pairs between these two known sequences, but we don't know the exact sequence because it's probably repetitive and we don't particularly need to know that. So each of these, wherever it has Um two arrows and an unknown piece in between is going to be a fragment and we're going to know the sequence that is represented by the arrow. Now let me show you how this can be useful to know this information. It's very useful to know this information, especially when you're trying to align the D N. A. So let's have a look and I'll give an example. So let's say that we have this sequence of D. N. A. Here and it's got the two arrows and it's got the unknown piece in the middle. So if I put the complementary strand we know that these two match up perfectly. They aligned correctly. They have complementary known strands or known reads represented by the arrows and they have complementary unknown sequences which are probably repetitive. So that is going to be matching this is normal. This is complementary but you can also see when things have been deleted or inserted or inverted or duplicated. So this is going to be the pair end reads is very helpful for determining how the chromosome has changed over time. Whether it's had a sequence insertion, deletion, inversion, rearrangement, duplication, anything like that. So if you guys see a deletion, this is what it's probably going to look like. So if this is the deletion, you're going to have your known sequences but then you guys can see that the unknown sequence, some of it has been deleted because it's not as long anymore. So we can see that the time top strand has a sequence of that unknown area that has been deleted. So there has been a deletion in here when it used to be this particular size. Okay guys, so now you can also see when there's been an inversion because if there's been an inversion, you're going to see the known sequences change their direction. So if you know this particular read on one end reads in this particular direction and then it completely flips that sequence. There has been an inversion. So this is what an inversion is going to look like. You're going to have the normal sequence here and then what you're going to have is you're gonna have the normal read on one end unknown sequence, and then you're going to have this known sequence going the incorrect direction and an inversion has happened on this read. So you guys can see that an inversion has happened in this chromosome in this particular area. We don't know where it might have started in here. We're not particularly sure but we know that that has happened. And since the unknown region is repetitive, we may never really know where in the D. N. A. It was inverted. Now, you can also see things like a duplications. So let me scroll down a little so we have some more room. You can also see things like duplications. So let's say that the this is normal right here and then you have this and it's much longer. It looks similar to the deletion. This could be a deletion or a duplication. You have to know what the normal length of this particular sequence is to know if part of it was deleted or part of it was duplicated. So we would say that a duplication of the unknown sequence happened in this particular strand of D. N. A. Because those two known sequences got farther and farther apart for some reason now there's more unknown sequence in between them. So some sort of duplication happened here. You can also see things like repeat insertions. I'll just draw that for you guys really quick so you guys know what it might look like. So this would be the normal one. Then you have what looks like a normal one again. But then what if you have this sequence added on? Let me get out of the way. So you guys can see what if you have another sequence. So we have our unknown region with our reads and then wait another unknown region and another read. Then this could be a repeat insertion. Where that sequence of D. N. A. Was duplicated and then inserted into the D. N. A. So basically these paired read ends or the pear and reads are going to be utilized to better understand DNA alignment and what may have happened to the D. N. A. At any point in time. A deletion inversion duplication insertion and it's utilized to understand regions of the genome that may have really repetitive unknown sequences of DNA but they're going to be flanked by known sequences of DNA. So we're simply going to sequence up to a particular point until we hit the D. N. A. And then we're probably not going to sequence anymore. But we know the general length of that particular sequence. And this is going to help us combat some of the difficulties that come along with sequencing the entire genome, which is made up a lot of repetitive sequences. Okay, everyone, let's go on to our next lesson.

