4 & 5. Statistics, Quality Assurance and Calibration Methods

Hypothesis Testing (t-Test)

4 & 5. Statistics, Quality Assurance and Calibration Methods

Hypothesis Testing (t-Test)

On a tight schedule?

Get a 10 bullets summary of the topic

The t-Test is used to measure the similarities and differences between two populations.

The t-test

1

concept

t-Test

Video duration:

6m

Play a video:

So here we say that the T. Test is used to test the mean or average between populations. And we're gonna say one of which could be the standard. Now, in order to test the similarities and differences between these populations, we utilize the T. Score to find R. T. Score, we employ the T. Score formula. Now we use the T. Score formula when we don't know what the population standard deviation is. Remember your standard deviation is just s as the total number of measurements increases, our standard deviation transitions into our population standard deviation which is sigma. And here this T score formula is used when the sample size is less than 30, size is greater than 30 would require the use of the Z. Test. We won't worry about the Z. Test because when it comes to population sizes that are that large, we'd have to use some type of software like Microsoft Excel in order to do those large calculations. Now here are T score formula is T equals our sample average minus new sub zero which represents our population average divided by our standard deviation which is s divided by the square root of the number of samples or or measurements which is n. Now how does the T score tell us about similarities and differences between our populations, what we're gonna say here that the larger the T score than the more different the populations are from one another. And then we're gonna say here the smaller the T score then the more similar they are to one another. Now here we use this T. Score formula when we're looking at um one population and we're just trying to figure out um just basic information in terms of it. Now when we're comparing different populations to one another we could use three variations of a T calculated value. Now here we can look at our T calculated for equal variants for unequal variance and for paired data. Now remember your variance is just your standard deviation squared. Now we're going to say here when our standard deviation is equal for both populations, we use these two sets of data here, T calculated will equal the absolute value of the average of of population one minus the average of population two divided by s pulled times the square root of the measurements, multiplied divided by the measurements added together. Now S pulled would have its own formula here where we're dealing with the standard deviation squared of population one times the number of measurements of population 1 -1 plus the standard deviation of population two squared times its number of measurements -1. Here. On the bottom we have the measurement of population one plus the measurement of population to -2. That bottom portion also represents the degrees of freedom involved. Now when the variances are not equal between my two populations, then I use these two sets of data again we use one to help us figure out what our T calculated, would be. Notice the differences in the formula whether the variances are equal or not. You're not gonna be expected to memorize these normally you'd give your given a formula sheet in which you can use them but also always referred to your professor just in case these calculations, these formulas can get very complicated. So it's always best just to give them to you here, the degrees of freedom, we can see also get much more complicated In order to figure out our degrees of freedom for unequal variances. We'd have to use this large formula here again, we're still dealing with standard deviations of population one and two measurements of population one and two. And then here on the bottom you have their measurements -1. Now paired data, paired data is used when we have two populations done by completely different methods. So let's say you're trying to analyze the reactivity or fluorescence of some type of material. You have a bunch of different methods that can be used that are very different from one another. In that case we rely on paired data to figure out T calculated in this case it's really just a comparison of the two methods to determine if your T calculated is given value For the 1st 2. We may be looking at two different populations but we're doing we're using the same type of software and same types of methods in order to figure out our T calculated. Now the whole point of figuring out T calculated at this point would be then to compare it to our tea table. So you'd find your T. Calculated and you can compare it to your T. Table if your T calculated happens to be greater than your T. Table. Remember we looked at our tea table when we're dealing with just figuring out confidence intervals, it'd be that same exact table. If T calculated were found to be greater than tea table, then you would say there is a significant difference in the average or means of the two populations in which you're examining. And if your T calculated were less, then you would say that there is no significant difference between the average or means between the two populations. So again, for the first two, where we're looking at variances, you determine if your variance between the two populations are equal or not, to determine which one of the two methods to use for these first two. This one and this one you're using the same type of method to analyze your population. Maybe you're using uh let's see, you're using some type of mass spectrometry or something to analyze the weight of an object to different populations. Since you're using the same method for both. You'd rely on these first two. And then for the final ones, he paired again, you're using vastly different methods in order to analyze the mass. Maybe you're relying on mass spectrometry to figure out the mass of one population samples. And then you're using just a basic balance to figure out the mass of a second set of samples. Since the methods are different, you would then compare them to one another. So just remember the subtle differences that lie between calculating T calculated for each one of these populations, and once you have your T calculated, you can use tea table to see if there is a big difference between the means of these populations or not.

2

example

t-Test Calculations

Video duration:

13m

Play a video:

So here it states a student wishing to calculate the amount of arsenic and cigarettes decides to run two separate methods in her analysis, the results shown in parts per million are shown below. Alright, so we have five samples for both methods here, it asks us is there a significant difference between the two analytical methods under a 95% confidence interval? Alright, so we're dealing with confidence intervals. So we know we're gonna be dealing with T calculated in some way, realize here that we're not dealing with just one method with a set of measurements, so we can't just simply um use our T. Score formula to find T. Calculated and compared to our tea table because we're dealing with two different methods with their own set of samples. We're gonna have to rely on the T. Test. Now the T test basically we're gonna say here if our T calculated which is what we're going to figure out is greater than our tea table. We're gonna say that there is a significant difference in terms of the means for these two different methods. If we calculate T and we see that it is less than our tea table, then we will say there is no significant difference. Now, how do we figure out T calculated? Remember from the previous page, when it comes to our T. Test T calculated, can be determined in three different ways either by figuring out we have equal variances and therefore we use those set of equations, we have unequal variances. So we use a different set of equations or if we have paired data in which we use yet another set of equations here, they're talking about two separate methods. So two different runs under different situation. Therefore this is impaired data. That means we have to determine are the variances in these two methods, Are they equal or they non equal determining that will determine which formula we use for T calculated and for our standard deviations or our degrees of freedom. All right. So we have to figure out what their variances are. Remember, variance is just your standard deviation squared. Alright. We're gonna say here, we first have to figure out what our means are for each. So for method one, we're gonna say that are meaner average equals each one of the measurements divided by the total number of measurements divided by five. So, when we do that, that gives us 92.1 for the mean or average of our first method. For the second method we do the same thing again. So add up each one of the measurements divided by the total number of measurements. So divide by five. So this equals 92.06. Alright, now that we've determined that we're gonna now find the standard deviations of the to and from that information will be able to determine. Are they equal or not equal? And that will determine which set of equations. We should use to figure out t calculated. Now we're gonna need room guys. So let me take myself out of the image. Alright, So Let's Look at Method one. Now, remember from previous videos, that standard deviation which we're just gonna label as s equals square root, and we have the summation of each measurement minus the average or mean squared Divided by number of measurements -1. So for Method one, standard deviation equals 1, 10.5 minus the average that we found for Method one squared. Next meth measurement 93.1 -92.1 Squared 63.0 -92.1 Squared plus 72.3 minus 92.1 squared. And then finally 121.6 minus 92.1 squared Divided by N -1. And is the number of measurements, which is 5 -1. So here when we do all that, we get a standard deviation of 24.742. Okay, next method to its standard deviation same exact process. So here we have one oh 4.7 minus 92.6 squared plus 95.8 minus 92.6 squared Plus 71.2 -2 92.1 Squared Oh actually 92.06 squared plus 69.9 minus 92.6 squared And then finally plus 1 18.7 minus 92.6 squared Divided by number of measurements, which is 5 -1. So here, when we do that, our standard deviation comes out to 21.27. Remember, your variance is just your standard deviation squared. So it would just be 24.742 squared. So here, when we do that, We get a value of 612.167 And then 21.27 Squared Equals 452.413. So here we have our variances for both, we can see that they are in their very much different values. Okay, so we say here that the variances are not equal to one another and because the variances are not equal to one another, that tells us which formula to use to figure out my t calculated, calculated. So, we're gonna say unequal variances, variances that are equal to one another really are different by less than one from one another. Here, they're different by um well over 100 from one another. All right, So, we know that we're dealing with unequal variances. So now we're gonna use the formula to figure out t calculated when the variances are not equal. So, t calculated when the variances are not equal equals x. So average or mean of Method one minus average or mean of method to in absolute brackets divided by square root of standard deviation one squared divided by n one plus standard deviation two squared divided by N two. So that's our formula that we're gonna use to figure out t. Calculated all we do now is input the values that we got. So that's 90 92.1 minus 92.6, Divided by 24.742 Squared divided by five plus. Okay we're taking the square root of this part right here And this is gonna be 21.27 squared Divided by five equals. Alright so now we're gonna say this top portion up here is 0.4 for the bottom portion let's plug in these values here and figure out what number we're gonna get and then we'll see what are ti calculator comes out to being. So here plugging all that in. Okay so that will be squared. So we have that portion right there. So this right here comes out to being 1 22.433. And then this portion here comes out to being plus 90.48-6 equals. So let's see when we plug that in what that gives us. So that comes out to being .002741 from IT calculated. So we have to double check our numbers so that's our T. Calculated for right now now we're gonna have to compare that T. Calculated to our tea table value. But remember we know that we're dealing with a 95% confidence interval. So we know what percentage we're dealing with but we still can't use the tea table yet because we are also missing our degrees of freedom now associated with this T calculated when the variance is not equal, is also our formula for degrees of freedom. This can be kind of complicated. So degrees of freedom when we have unequal variances is this equation? So it's it's a pretty big equation. Again, you're not gonna be expected to memorize this. Um you'd be given this on a formula sheet, so don't freak out too much by the length of this equation. So here are degrees of freedom would be here, It be standard deviation one squared divided by the measurements for Method one plus standard deviation two squared Divided by number of measurements for Method two. So this is all squared divided by standard deviation one squared divided by measurements. This is squared divided by number of measurements from method one minus one plus standard deviation two squared divided by number of measurements from Method two squared Divided by number of measurements for Method 1 -1. So you can see that it's a huge big mess of numbers. Now, if you do this correctly, what you should get for the top portion, when you plug all this in and you take the square, you should get 4533 3.2 divided by and then the bottom portion would get here 3747.48. When we do this portion plus 24 6.77. When we do this portion here And then that comes out to 7.8 or roughly? 8.0. Remember our degrees of freedom needs to be a whole number. So we just round up to 8.0. Now go back a few pages. Look at the tea table, we have 84° of freedom. We have a 95% confidence interval. We're looking at make those numbers meet up. If you look at it correctly, you'll see that the T value according to our tea table equals 2.306. So that's our tea table value. Now We come up here and remember the two conditions whether tea table is greater or less than T calculated. So we found out that tea table was 2.306. Right. And then we saw that R. T. calculated is this value here? .002743. Okay, so then here I think that was the number right .002741. Yes. All right. So we can see here that our tea table value is a bigger number than R. T. Calculated value. So what does that mean? That means that there is no significant difference in the means between the two methods or two populations in this case. So remember for a question like this when they're talking about two sets of data, they each represent a population. We're going to require the T. Test in order to test the means between those two populations. So we have to first figure out what our averages were for both and from that we'd be able to determine their standard deviations from the standard deviations. You can calculate your variances. If the variances are unequal, we did this method to find our answer. If the variances had been equal, then we would have used the other set of values to find T. Calculated R. S. Pulled and our degrees of freedom and then still compared to our tea table to see if there is a significant difference or not. So just remember the steps that we employed here. Remember the use of the tea table as well as the formulas from the previous page, whenever we're dealing with on percent confidence intervals and the T. Test.

3

example

t-Test Calculations

Video duration:

8m

Play a video:

So in this question it states you want to determine if concentrations of hydrocarbons in seawater measured by fluorescence are significantly different than concentrations measured by a second method, specifically based on the use of gas chromatography slash flame ionization detection, which is labeled as G C f I D. You measure the concentrations of a certified standard reference material, which is 100 Micro Moller. In both methods, you have seven times. Specifically you first measure each sample by fluorescence and then measure by the same the same sample by G C F I. D. The concentration is determined by the two methods are shown below. Alright, so we have these seven samples being measured by two completely different methods and we know that because we're doing it by two completely different methods which were then going to compare to one another, then we know that this is a paired data test. So it's paired data. So that tells us what formula we need to use in order to figure out our standard deviation as well as our t calculated value. Now, here it says calculate the appropriate statistic to compare the two sets of measurements. Alright, so again, we're looking at two entirely different methods in order to figure out these values and because they're totally different methods, we're gonna compare them by the paired data uh steps. So we're gonna say here following paired data, T calculated equals here, this represents our mean difference in absolute terms divided by our standard deviation times the number of measurements, which is N. And then we're gonna say our standard deviation equals Square root and it's the summation of each difference minus the mean difference squared divided by number of measurements -1. Now, how do we figure out our difference? Well here we're gonna come up with another column which is our difference. So what we're gonna do is we're gonna take each one of these numbers and subtract them from each other. So this is 100.2 minus one oh 1.21 oh 1.1 which equals negative .9. And then we're gonna do 100.9 -100.5. So that's .4 99.2 -100.2 which gives me negative .3 99.9, We're gonna have 100.1 -100.2 which is negative .1 100.1 -99.8 which equals .3 101.1 -100.7 equals .4 And then 100 -99.9 which equals .1. So all we've just done is figure out the differences when we're just subtracting these values from one another. Alright, now that we have that we're gonna have to figure out what our mean differences. So just like any mean we're gonna take each one of the differences, add them up together and divide by the number of measurements. So it's gonna be negative .9 Plus .4 Plus negative 0.3 plus negative 0.1 plus 0.3 plus 0.4 plus 0.1 divided by The number of measurements which is seven. When we do that, that's gonna give me negative .014 as my mean difference. Next we're gonna do our standard deviation. So here's standard deviation. You're gonna take each difference measurement, so negative 0.9 minus the mean difference, so minus a minus 0.14 squared plus 0.4 minus a minus 0.1014 squared plus negative 0.3 minus a minus 0.14 squared plus negative 0.1 minus a minus 0.14 squared. So you can see that this is very tedious but you gotta make sure you plug them in correctly, 0.3 minus a minus 0.14 squared plus run a little bit out of space. 0.4 minus a minus 0.14 squared plus 0.1 minus a minus 0.14 squared Divided by the number of measurements which is 7 -1 realized here. That because we're dealing with differences, all we're really paying attention to are these values here. These other numbers that we have initially these numbers here, they were just there to help us figure out the differences. We don't pay attention to them anymore. So we do all that and we plug it in. We get our standard deviation as 0.47. Okay, so now that we have that we can figure out what our t calculated is. So t calculate here equals again the mean difference, absolute terms divided by standard deviation, times number of measurements, square word of them, so that equals point uh negative and absolute brackets .1014, Divided by .47 times square root of seven measurements. Again, we're dealing with these seven measurements here. So here when we do that we get our t calculated As .08. Now here let's assume that we're dealing with a 95% confidence interval because that's quite the common percentage to look at. So here are T calculated Again is .08. All you gotta do is go back to your students tea table here we have to figure out our degrees of freedom. So your degrees of freedom, which I'll abbreviate as D O. F is n minus one. So the number of measurements from the differences is seven minus one. So that's a degree of freedom of six. So look at your student's T table, look for degrees of freedom of six, then move over to the right and look for 95% confidence interval. See where they to meet, they meet where ti cal tea table Would equal 2.447. So we're gonna say here that T calculated is less than tea table because of that. Even though we use two separate methods, we're going to say that there doesn't appear to be any significant difference between both methods. So whether you're using fluorescence or G C F I. D, either method more or less gives us um similar means for the two sample populations. So again, we used paired data here because we're dealing with analyzing populations by completely two different methods. One was fluorescence method and the other one was G G C F I. D. When we're using completely different methods, we rely on the paired data approach. If the methods are different and we're testing two populations, then we have to look to see if their variances are equal or not, determining if their variances are equal or not. Helps to determine which set of equations to use to figure out the calculated standard deviation and your degrees of freedom.

4

example

t-Test Calculations

Video duration:

3m

Play a video:

So in this question, it says a sample of size n equals 100 produce the sample mean of 16, assuming the population, population deviation is three, compute a 95% confidence interval for the population mean. Alright, so we're talking about populations, so we're going well above our normal number of measurements within a given population. And remember we said that if we're going above 30 that usually means that we're not using the T test, but instead the Z test Now, because I don't give you a z value here, go to your students tea table, look at your student's T table and we're dealing with a 95% confidence interval. So look at the com column that's dealing with 95%. Since we're dealing with populations that are incredibly large, we're gonna look at the degrees of freedom as being equal to infinity. So, if you line up infinity with your 95% confidence interval, you'll see that your Z score then would be 1.960. Okay, so that's the logic we use when we're dealing with incredibly large populations like we are in this question. So here we're gonna say that it is our mean or average plus or minus R. Z score here, times now we're dealing with our population deviation. So that's our our population standard deviation. Remember your standard deviation is s and when we transition to a population standard deviation, it becomes sigma. So it'll be times sigma over the number of measurements. Look at this, look at the similarities that this has with a typical confidence interval. A typical confidence interval would just be the mean plus or minus your T score times your standard deviation divided by n the square root of n. Again, we've transitioned our standard deviation to the population standard deviation. And because we're dealing with so many numbers that are much greater than normal, because we're dealing with a population with a larger data set, T has transitioned into Z. All right. Other than that we plug in the values and we'll have our answer. So our mean here is 16 plus or minus 1.960 times your deviation, which is three divided by the square root of 100. So here, when we plug all this into our calculator gives us .588. So this is 16 plus or -188. So what does that mean? That means we have 16 minus 160.588 and then we have 16 plus 160.588. So that means that we're 95% confident That our value alive in between 15 412 - 16.588. So that would be our level of confidence within this particular question. So, remember when we're going beyond 30, we transition for more of AT score to a Z score here, we're dealing with infinity in terms of degrees of freedom and therefore because we're dealing with 95% confidence interval when we line it up on the T table, that gives us a score of 1.960. Now that you've seen this one, attempt to do the practice question left here on the bottom once you do come back and see how I approach that same exact practice question.

5

Problem

Problem

The average height of the US male is approximately 68 inches. What is the probability of selecting a group of males with average height of 72 inches or greater with a standard deviation of 5 inches?