Skip to main content
Pearson+ LogoPearson+ Logo
Start typing, then use the up and down arrows to select an option from the list.

Genetics

Learn the toughest concepts covered in biology with step-by-step video tutorials and practice problems by world-class tutors

3. Extensions to Mendelian Inheritance

Chi Square Analysis

1
concept

Chi Square Analysis

clock
2m
Play a video:
Was this helpful?
Hi in this video, we're gonna be talking about the chi square analysis. So chi square is going to be a statistical test. So we're about to get into a lot of math. I'm really sorry, but it's genetics. It hasn't happen. It. Um So the chi square test statistical test and what it is testing for is whether the expected result that you get is um very similar to the the observed results. So the expected is what you expected to get and the observe is what you actually see. So the reason we have to have this test is because in genetics it's never perfect. Right? You I mean, if you may organisms together and you get, you know, 2000 offspring, for instance, for flies or something, you're not going to get a perfect 3 to 1 ratio. You're just not because there's, you know, there's 2000 offspring. It may be very close, but it's not going to be perfect. I mean, it's just life has never works that way. You also won't get the perfect 9 to 3 to 3 to 1 ratio. Either life isn't perfect, genetics isn't perfect. And so if we are doing this experiment today, we have a bunch of flies, we have 6000 flies and we counted them all. We have these nice ratios and it's like really it's like 2.96 to 1, you know, is that actually close enough 3 to 1 to say, okay, this is our normal, this is so close to expected, this is dandelion inheritance. So that's what the chi square test is for. So it's used to check if your numbers that you got from your experiment are close enough to be expected to say that, to say that, you know, it's dandelion inheritance. So the important numbers that you have to know to do a high square analysis is the observed numbers. These are the numbers you actually get in your own experiment and the expected numbers. And these are the numbers that you are expected to get, the perfect ratio, the 3 to 1, the perfect punnett square, that is the expected. And so of course because we're doing math, there's going to be a formula and it looks like this, realize it's kind of confusing. Here's your chi square, that's what it looks like. Uh, that's the notation for it. You don't remember from math. This means some and then you have these os and ease. So what do they stand for? Well, oh, very clearly means observed numbers. So these are the numbers you get and the expected numbers use green. Are the numbers that the perfect ratio numbers. So that's an overview. Um, let's now move on to the actual practice question and practice using chi square. Um in an actual question that you might get. So let's move on.
2
example

Chi Square Analysis

clock
18m
Play a video:
Was this helpful?
Okay, so this practice problem, we're gonna be walking through the steps to using a chi square test. So here's the example question. So you have a purple plant and you think it's hetero sickness? You don't know right? You think so? If it's not hetero I guess what could it also be? Could also be homos I guess right? A a or a a uppercase lowercase or just all uppercase. But you think it's hetero I guess and you want to know for sure. So what you wanna do is you want to breed it with a Hamas I guess recessive. Now this has a fancy name, it's called a test cross. So you may see that in a question that's similar to this, it says do a test cross test crosses, you may, whatever you have with the Hamas I guess recessive. So that's a test cross. So this is what you're doing. So you have this purple plant, you think it's hetero Ziggo. So you're going to take a white plant that you know is homogeneous recessive and make them together. And it says, okay, after you do that mating, you produce 100 and 20 offspring, 55 or purple and 65 are white. Was your plant heterocyclic. So how do we actually go about doing this? And using a chi square? Well the first thing that you have to do is so you know your these are the observed numbers, right? Because you um observed this, you did an experiment, you have 100 and 20 offspring 55 or purple 65 or white. So these are euros, Remember the formula for probably square. I'll show it down here again. But this is this is it. Oh minus E squared minus E. Or to the second anyway. So you have these numbers gave them to you in the problem. But now the very first thing you have to do is determine what the expected numbers are gonna be. So if you cross the heterocyclic with a homesickness recessive, what do you get? The first thing you have to do is just a punnett square. So here's your hetero Z guess here's your home as I guess leo gets its own column. Moreau now, when you make these together Right, you get 1/2 our hetero cigarettes. And these would be purple. And you get 1/2 Hamas agus. And these would be white. So if you have 120 offspring, remember which is what the question gave us. How many are gonna be purple and how many will be white expected me back out? Remember, we're working with expected. Well, the punnett square tells us that half will be purple and half will be white. So half of 1 21 20 divided by two is 60. So here are the expected numbers. We got these from the punnett square that we did. If we used the hetero sickness and Hamas, I guess half we're gonna be purple, half we're gonna be white. So out of 1 2060 or purple 60 or white. These are your expected numbers and that's the very first step to doing a high square. So this is great. So we have our observed 55 65 we have our expected 60 and 60. So now we can use the chi square formula. Remember squared equals minus E. The second over E. So you say you may say, okay, well I have this formula, but do I use 55? Do I use 65? How do I do this? Will you actually do it for each class? So unfortunately I wrote red here for some reason. But what I'm trying to say is purple And you do it for every number. So purple was 55 was the observed and expected are here. So we got we actually observed 55 purple. We expected from the punnett square to get 60. Then if we put it into here, oh minus E. To the phrases 25. And then if we divide it by E. That's 250.42, then because this formula has this thing in it, the some, Then we do the same thing for the white 65 is observed, 60 was expected 25.42 and then we add these together to get our chi square value of 0.84. So does that make sense? You get everything in that step right? We we the problem gave us are observed, we calculated are expected to opponent square and then we plug these numbers into a chart like this so that we can solve this formula here. Now you're welcome. If you're given this problem you know in a test or quiz to make your own graph, I really highly suggest that you do so you don't get confused. Um And so you just solve the problem and now we have our chi squared value. So now what do you do with that high square value? What does .84 actually mean? Well when you have this number there's a special table called a chi square distribution table. Now if you're given a question like this in a test or quiz you will be given this table, you don't have to memorize it, you'll be given it. And so um you use the chi squared to determine whether or not your hypothesis is true. Remember our hypothesis or what we thought was that the plant was hetero zig asse. So how do you use this table? So this is what the table looks like and it looks a little overwhelming. So how do you actually use the table? Well the first thing you do is you calculate what is called the degrees of freedom and that's here that determines what row you're going to use or find the 0.8 for on. So how do you calculate that? Well that formula is the degrees of freedom or D. F. Equals the number of variables minus one In this question. There are two variables, how are there two variables? Because we have purple and white looking at two colors. So in this question are degrees of freedom is 2 -1. The number of variables -1 and that equals one. So now we have our degrees of freedom and we know that we are going to be using this row. If our degree of freedom was six, we'd use this one. If our degree of freedom with nine, we'd use this one but it's not, it's one. So we are using this row, this row right here, this is the row we're looking at. So now you have your degrees of freedom. You know what road you're using now you use your chi square value, which if you remember ours was .084. So what you do is you look down this row and you figure out where would 0.84 B. So 0.84 is not between this, it's not between 0.2 and 0.6, it's not between 0.64 and point and 10.83 where it is is 0.86 falls somewhere between 0.46 and 1.7. So this is where this fits in. So these are the two rows you're looking at right here and you say, okay, well now I have so many numbers now, what is this 0.46 and 1.7 stand for, You don't actually need to know these numbers. These numbers are really not important for anything right now, but what you need to know is finally what the p value is. Okay, so these two values sit above these two p values .5.3 Little C You say? Okay now here's another number. What do I do with this? Well what .5 and .3 means is 50 and 30%. So what are you? So what does that mean? Right, so this means that for this problem there is a It's 50-30%. So I don't want to mess you up. So I was not going to use that explanation that I thought I was going to use instead we're going to move to the fourth. The fourth step. So I get your confused. But what you did here, just a quick review on step three, you started with your chi square, you calculated your degrees of freedom. This was one, you looked on row one to where 10.84 would fit. You found these two numbers and you looked down to see what numbers they fit between on the P value here and you found this is 20.5 and 0.3 which equals 50 to 30%. Now, what do you do with 50-30%? What does that actually mean? And this is when you determine whether you accept or reject your null hypothesis. So the null hypothesis is not, I think this is heterocyclic instead. The null hypothesis states that there is no difference between measured and predicted. So this would be observed and expected. So that means that the 55 number that we got is really no different than 60. I mean it is different, but it's not statistically it's not mathematically different. It means that the expected ratio that we got is, you know, or the observed ratio that we got is no different than the expected ratio we expected. It's, you know, it might be a little bit off, but it's close enough that you can say it's the same thing. So this is your null hypothesis. Now, every null hypothesis and every experiment you're going to do is going to be exactly the same. It's going to say that the observed and the expected are not different from each other. And so in this problem are no hypothesis is that the 55 Because I changed colors here, the 55 purple and 65 white plants is close enough to the 60 purple And 60 White plants and therefore they are not different. So step four is to determine whether or not that statement is true or not. Is it true that they're not different or is it false and they are different. So generally for this, you accept the no hypothesis. Now I'm saying except if you are a math nerd or if you are a statistician or you have a really strict statistician geneticist professor, they're not gonna want you to use the word except I use it here because I think it's easier to understand what they're going to want you to uses fail to reject. Remember these are the exact same things. Um I can't determine which one your professor wants you to use, but I suggest that you actually ask them this. Do they prefer the word accept or do they prefer the word fail to reject? I'm gonna use except because I think it's much less confusing than fail to reject. But just know that in other statistics classes and then you know certain professors classes, they're not gonna want you to use that term. But for me I'm using them interchangeably. So is it true that are observed and expected values are pretty much the same. So I accept that statement. So I said, yeah, they're close enough. It you know, it's the observed and introspective are close enough. I accept that as true. If My probability remember the p value from above is greater than 5%. Now ours we have 30-50%. So that's much greater than five. So in this problem we are accepting this um This no hypothesis that they are not different. You would reject it if it's less than 5%. Um and you would say Okay, no, these are very different. Numbers 55 and 60 are way too far apart. This this null hypothesis is false, I'm rejecting it. So like I said, but for this question it is 30-50%. Therefore we are accepting the null hypothesis. And so the fifth thing is to remember what that actually means. So a lot of times people get through, they power through the math, they know whether or not they're accepting or rejecting the null hypothesis and they're like great, I'm accepting this null hypothesis and then they forget what the question was answering. So are asking so this here super important step. And actually a lot of people will miss a question on a test or a quiz because they skip this step, they just say okay yeah I'm accepting the null hypothesis but you have to go that one step further and remember what you were doing in the first place. So remember that we are saying that the null hypothesis means that the observed and the expected aren't different, they're the same essentially. And so in this question we were talking about whether or not the purple plant was hetero zegas or Hamas I guess and we predicted that it was hetero zygotes, we thought it was this um genotype right? And so if we accept the null hypothesis, that means that we are 95% confident and how I got this number, I'll explain in one second, but we're 95% confident that the purple plant was indeed heterocyclic guess because our observed value from our mating which was 1 20 it was 55 65 was not different than the expected value of 60 and 60. If it was different then we would say no this isn't heterogeneous. But because these two values are considered not different statistically then we are confident that this plant is headers I guess so this is the super importance that do not forget the step. Remember if you accept to reject the null hypothesis, go back to the question. The very original question and remember what you are testing to begin with and write your answer that way. Not just I accept or reject the knoll because you'll likely be counted wrong for that. Um now for those who are interested or who have professors that are really math heavy notice here that I said 95% confident. I'm not 100% confident of course in math. You're never 100% but I am 95% confident. So why did I put 95%? Um if you're interested keep listening. If you're not you can totally tune out. That's fair. The reason I did 95 is because of this here. Specifically these numbers. So I said that we accepted the knoll. If it was greater than 5% and I rejected the knoll. If it was less than 5% now 95 plus five equals 100. So this is where we're this is where we're not getting 100%. We're saying that there's a 5% 5% chance. I could be wrong because I left that up for error. Right? I said I accepted this. If the probability was greater than 5% now, I could have accepted it if it was greater than 2%. And in that case it would have been 98% confident. So I'm taking I'm giving myself a little wiggle room here, I'm saying because I put 5%, which is typically what people do. I'm saying that okay, well I could be wrong 5% of the time, but 95% is good enough. And so that's what I'm going to use now. I could do 0.0001 here and be like really super confident. But if I do that, I may actually miss something and I may you know say oh these are totally different things when they're not just because I didn't give myself that air room that you need when dealing with, you know, natural occurring things because there could just be an extra plant and it doesn't mean anything. But if I really was too restrictive, I was like, oh that you know that plant one too many And that's it. They're not the same. I reject this knoll when really you should have accepted it and you were wrong. So 95% people like to give themselves 5% error room like to give their plants. So their experiments 5% error there. And so this is why I said I was 95%, not 95 is close enough, it's actually very good. You're 95% confident, you're fairly well confident. And so so that is how you do a chi square. So I realize it's a lot of steps, it's five steps, I would say the fifth is the most important. Don't do all the math and forget this last step. Um And yeah, so I get it's a it's a long question, but you're going to see questions like this on a test, like I guarantee you you'll see a chi square question, so make sure you really understand this before moving on to other things. So with that let's now move on.
3
example

Step 1

clock
3m
Play a video:
Was this helpful?
Okay. So this question it says that you have done some type of mono hybrid cross. And you observed these F two phenotype in the in the cross. So it looked like you were crossing some type of flowers probably red or white. And the F two generation ended up with around 900 average around 900 red and around 300 white. And it says the question says which of the following null hypotheses is best for using the chi square test. So the chi square test is used to determine whether your expected values are equal to your are the same as your observed value. So in this cross you got these offspring and it's there there's about 900 red, 300 white. So it's saying which of these ratios did you expect for this cross? Like which one of these do you want to test and see if the genetics are working in that fashion? So a 9 to 3 ratio is what we're gonna do this and write it in a ratio. And it was to be a ratio that you're used to seeing one of these ratios here is this most like a 3 to 1 ratio. A 2 to 2 ratio. A 9 to 3 to 3 to 1 ratio or a 3 to 2 ratio. So if you have no idea, I think there should be an obvious one. But if you're still confused, you're not sure. I think there's some that we can easily cancel out. We can easily cancel out 9 to 3 to 3 to 1 because there aren't four phenotype, right? So each one of these would have to be a different phenotype but we only have two phenotype. So we have red or white. So this one obviously can't be it. Um the second one that we can mark out is this to to to because that means that they would be equal. So either both would be 900 or both would be 300. But that's not the case that we're seeing. We're seeing a 900-300 ratio. Therefore this one they're not equal. So be can't be it. So the last two ones that are the options are 3 to 1 or 3 to 2. Now the best way to do this is just to do regular math. So if you are divide 302 900 how many times would it go in? Right? It would be three. Right. So 300 times three is going to be 900 therefore this will be one. Now, if it were a 3-2 ratio, what you would see is a 900-600 ratio, which is not what we saw. And you can do this through just regular division, you can do it by dividing 300-900 and realizing you get three and therefore you have and then you can do 300 times three equals what it's going to equal 900 and therefore that is a 3-1 ratio. Um And so Yeah, so with that the answer here is a so if you got this prototype, the null hypothesis that you would want to test, and you want to see if your values that you got here are equal to the expected values of a 3-1 ratio. So with that let's not move on.
4
example

Step 2

clock
57s
Play a video:
Was this helpful?
Okay, so um using the same exact data from before this question is asking you which of the following here represents the degrees of freedom for this problem? So how many degrees of freedom are in this experiment if you were to do a chi square analysis? So we are answers are 123 and four. So which one do you think it is? Write the answer here is actually one. And the reason that it's one is because we have two phenotype, right? That's red and white. That says white, in case you can't read my horrible handwriting. But remember, the formula for degrees of freedom is the number of phenotype which we have 2 -1. So the degrees of freedom for this experiment would be a which is one, so that let's not move on.
5
example

Step 3

clock
2m
Play a video:
Was this helpful?
Okay so in this question we are going to actually be calculating the chi square value for this data. So remember the chi square value is given this um interesting symbol here, remember the formula for this is going to be observed. So these values are observed minus expected Squared, so expected would be what? It was a 92 or 3-1 ratio. The expected values will be these an exact 3-1 ratio over expected. And remember this is the sum. So if the sum for the whole thing, so for each one of these we have to do a calculations, let me disappear. And so um first we'll do red. So there we have we have 8 92 minus 900 squared over the expected value of 900 plus the white which the observed 2 94 minus the expected which is 300 squared over 300. Now I'm gonna give you a second, you can pause it if you want, you can do whatever. Um Go ahead, put this in your calculator and what do you get? What is this equals, this equals the high squared value which is one of these values here. So go ahead, put it in your calculator and see what you get. Give you a second. You know it takes a little bit of time. Um So I'll just pause for a second and give you time to punch that into your calculator. And so hopefully you have enough time. If not go ahead and pause it while you finish because I'm about to give the answer. So the answer here is b. This is 0.191, and that is your chi square value. So with that let's not move on.
6
example

Step 4

clock
3m
Play a video:
Was this helpful?
alright, So now now that we've calculated the chi square value, we know it's 0.191 and we have our degrees of freedom, which we know is a single one. So that's one, it's a the next step is to determine the range of p value. So what's our p values for this experiment now, in order to do this, you're going to have to have a high square distribution table. You feel free to use the one in the handout which you probably already have up. If you don't go ahead, take your phone computer or whatever, open a new tab, just google high square distribution table. There's a ton of them um wherever they are now to read this table, the first thing you do is you look at your degrees of freedom, there should be a line called degrees of freedom and everything down here, it's going to be listed with different numbers. The one you're interested in is this line here because it's the degrees of freedom of one Now above here, you're gonna have in here, you're gonna have a bunch of different numbers and what you're looking for is the numbers through which .191 sits in the middle. So eventually you're going to get to a table and now it may not be perfect if you're using the table from handout I'll have the exact numbers. If you're using a table from google may be different numbers, but essentially it's going to be the same thing and the problem, it won't mess up the no matter what chi square distribution table you use, it's not going to mess up um the how you solve a chi square problem. So if you're using the table um that I provided we're gonna match perfectly. If you're not it's okay, you're still gonna be correct. But it may, the numbers I'm about to say may not match perfectly, but they'll be close enough. So on the table and the pds that I provided, you're gonna come across numbers 0.15 and I believe 0.46. And if we were to put 0.191, it would fit right in between these numbers. And this is fantastic. So what you do now now that you have these two kind of circle them if you want and you go all the way to the bottom where the p value sits And here what it's going to say is going to give you a bunch of different numbers, but you're interested in the one that's lined up in the same column as these two numbers and that's going to be .70 and 0.50. So the answer to this question is a now remember if you're not using the same table as me, it may not be exactly this, it may be .752.55 or maybe slightly off, but essentially um pick the closest one which for this problem is going to be 10.70 and 0.50. Now when you're doing this in a classroom setting or on a quiz or a test, they're going to provide you with the exact same table as everyone else. So there won't be these weird confusions. But for this, I want to give you a chance to practice looking at different high square distribution table so you can figure out how to look at ones, different ones differently. But for this question, a the answer is 4.70,. So let's figure out what that means in the next question.
7
example

Step 5

clock
53s
Play a video:
Was this helpful?
Okay, so this is kind of the next step of this problem. If a chi squared value has led you to receive a p value range 0.70 point 50. Also, uh 70 to 50%. You will you accept or reject the null hypothesis? And so um Do you first do you remember there's certain cases where the P values have to be above or below a certain amount, certain threshold where you determine whether you accept right? That threshold is 5% or 0.05. Now we obviously got much higher than that. And so if it's larger than that, then what does that mean? That means we accept the null hypothesis. And so we're gonna talk about what accepting the null hypothesis means for this question. Um Next. So with that let's move on.
8
example

Step 6

clock
3m
Play a video:
Was this helpful?
Okay, so which of the following statements is true when we accept a null hypothesis. So we accepted the null hypothesis in the previous question. So what does that mean for our experiment? Doans that the observed and the expected values are different. It could mean that we are 95% confident that are observed and expected values are different. We are 95% confident that are observed and expected values are the same and we are 50% confident that are are observed and expected values are the same. So the first thing we know is that when we accept a null hypothesis, we're saying that are observed and are expected values are the same, are essentially the same. So automatically we can go ahead and mark out a and b. Which said that this show that was different. So our choices now are C&D. The difference between these two is whether we're 95% confident or 50% confident. So which one do you think it is? Okay. So the real answer here is 95% confident. And the reason that it is is because if you remember when we decided whether to accept or reject our null hypothesis, we set a threshold and that threshold was 5% or 0.05. And so when we said, we said Okay, our values were more than that. So we accepted the um so we accepted the null hypothesis. So when we um so what we did is we set our threshold, we said that because we set the threshold at 5%. We are 95% confident And that's just taken from 100 -5 equals 95. Right? So we're not 100% confident because we gave ourselves this like 5% range of error to accept the null hypothesis. And so we are 95% confident that The observed and the expected values are the same. So in this problem, remember we were looking at red and white flowers in the F2 offspring and we got certainly got really close. I don't remember exactly what it is, is like 894-94 something around there. Um but essentially, we were testing and we were saying, are these close enough to 300 or 900 and 300 to say that this problem is or this trait is a 3-1 ratio and therefore is Mendel Ian. Because we didn't know right. We just did this experiment. We had no idea. We're looking at these red and white flowers, we got this number of offspring. And we said, okay, well, is this a 3 to 1 ratio? Is this close to the dandelion ratio? We would have expected. We went through all these steps and finally, we figured out that yes, It is because we're 95% confident that the values that we observed the number of offspring for red and white flowers that we actually got. We're close enough to the expected values of 900 and 300 to be a 3-1 ratio, therefore be Mandali in so that they are the same. So C. Is the answer here. So make sure you understand. I realize that this problem is a lot, there's a lot of different steps. So make sure you understand all of these steps because I guarantee you you will have to solve a chi square analysis problem on a test at some point in your genetics career. So make sure you understand what's going on in each one of these steps. So with that, let's move on.
Divider