4 & 5. Statistics, Quality Assurance and Calibration Methods

Detection of Gross Errors

4 & 5. Statistics, Quality Assurance and Calibration Methods

Detection of Gross Errors

On a tight schedule?

Get a 10 bullets summary of the topic

When dealing with data sets it becomes important to eliminate outliers in order to have the most accurate standard deviation.Â

Grubbs Test vs. Q Test

1

concept

Grubbs Test vs. Q Test

Video duration:

3m

Play a video:

So our two tests on this page are basically methods that we can use to determine if a value within our given data set should be ignored or not. Now, we're gonna look at first Grubbs test, Grubbs test is used to detect a single outlier in a single variable data set that follows some type of normal distribution. Now, here Grubbs tests, we first have to calculate R R G calculated. Here, we have our questionable value. So our potential outlier minus your meaner average in absolute brackets divided by your standard deviation. Now, here we're going to compare our G calculated to our G table. Now, here we have our number of observations and then we have our G table or sometimes called R G critical based on a particular confidence interval, we have 90% 95% and 99% confidence. Now, if our G table value happens to be less than R G calculated, that means that outlier needs to be discarded and we need to recalculate standard deviation and the mean with the remaining data sets next. If you're G table is greater than your G calculated, that means that outlier is fine, it's within the normal level of confidence, so we can retain it, hold onto our mean and standard deviation. Our Q Test is another method that's usually not talked about, but here, this is just another method in finding outliers in very small normal, normally distributed data sets here, the number of measurements is normally between 3-7 values. Now it can exceed that, but the Q Test is usually reserved for very few data measurements. Now here we're gonna say Q calculated equals your gap divided by your range. Now what does that mean? Well your gap is absolute brackets X. One minus X. And plus one X. One is just the suspected outlier that we're looking for. So we're trying to determine if this is the number we need to ignore and then here we're gonna say here this is the next closest data point. So that's the next measurement that's closest to that outlier and then range your range is just your largest value minus the smallest value in your data set for the Q. Test. What you need to do is you need to take all your measurements and need to organize them from smallest to largest value and then your range is just that largest value minus your smallest value. We'll see how to utilize this later on. As we do a question on the Q test. Now, just like the Grubbs test, we compare our Q. Calculated in this case to our queue table. Again we have a number of measurements which you can compare two different levels of confidence here. Again, if your table value is lower than your calculated value in this case Q. We disregard that value, it is an outlier and it cannot be included with our data measurements. If your Q. Q. Table or your cue critical value happens to be greater than your Q. Calculated. Then we can hold onto that suspected outlier and say that it does belong with the other measurements. Again, Grubbs test is the more commonly used test to find. The outlier Q. Test is normally not discussed as much, and it's usually reserved for very small amounts of measurements. So just remember these two different types of tests that are great at finding an outlier within a given data set.

2

example

Q Test

Video duration:

4m

Play a video:

So here it says wishing to measure the amount of caffeine in a cup of coffee. You pour 10 cups from the data provided, perform a Q. Test to determine if the outlier can be retained or disregarded. All right. So what we need to do here is we need to organize this list of measurements from smallest to largest. Alright, So We can see that the smallest number is 72. Then the next smallest number looks like it is 77, then 78 Then 78. Again than 79. Than 81 81 again, 82 twice And then finally 83. So, I've organized it from smallest measurement, two largest measurement. Remember that's important because that will give us the range that we need. Alright, so now we're gonna figure out our Q. Calculated. Remember our Q calculated will be the outlier that we're investigating minus the number that's closest to it in absolute brackets divided by our range. So remember that's gap divided by range. So actually may rewrite this. So, if you calculated his gap divided by range. Alright, So the number that we're looking at is the outlier, The one that's farthest from everyone else. So let's see that the difference between these two is five. And then we're gonna say the difference between these two is one. These are zero, this is one, This here is zero. The difference between these two is 2. The difference between these two is one. Difference between these 20 difference between these two is one. So the outlier is the one that's farthest from all the other measurements and we can see that that would be 72 72 is the farthest from everyone else. It's difference is five away from 77. So that's the outlier that we're investigating. So that's 72 Minour the value that's closest to it which is 77 Divided by the range. Remember the range is your largest value minus your smallest value. So your largest value is 83 minus your smallest value which is 72. Okay so that's gonna come out to being .455. Now if you take a look at the queue table that we have on the previous page let's say we want to look at it in terms of our 99% confidence interval. We want to have 99% confidence if this should be kept or not. Alright. So looking at the number of measures measurements we have we have 10 measurements. So looking at that table, We're looking at 10 measurements and scroll all the way to the right. Till you get to the 99 confidence value There were going to see that cute table or q critical equals .568. So comparing our Q. Calculated to our queue table, what can we say? Well we're gonna say that are cute table is a larger value than our Q. Calculated. Therefore you have to retain the value. So that number of 72 we're gonna keep it around from our 99 confidence that we were able to find out from our queue table. So that's all we have to do in order to figure out our Q. calculated. And then compared to our queue table now that you've seen this one, look to see if you can figure out the example to that's on the bottom of the page. Don't worry if you get stuck, just come back and see how I approach that same example question. And just remember some of the techniques we used here for the Q Test.

3

example

Grubbs Test

Video duration:

5m

Play a video:

So here it says white blood cells are the defending cells of the human immune system and fight against infectious diseases provided below is the normal white blood cell counts. For a healthy adult woman determine if the current white blood cell count is reasonable by Grubbs test. All right, So here we're going to first realize that for Grubbs test we need to first calculate R G calculated. So G calculated equals are questionable value, which in this case is today's white blood cell count minus your meaner average in absolute brackets, divided by your standard deviation. S So for our mean or average, we add up each one of these seven values and divide them by seven. When we do that we get 5.2857 times 10 to the six. Our standard deviation equals remember square root of the summation of each measurement minus the meaner average squared, Divided by N -1. So here we just have to input each measurement minus by the average and that's squared and they're just gonna add them. So it's pretty long drawn out plus 4.9 10 - 6 -5.2.857, 10 to the six squared. So as you can see, it's a lot of writing for these numbers. Okay, almost done. Just gotta finish the other two. So remember we're inputting all the values All seven of them Plus finally this last one and then divided by number of measurements which is seven minus one. So all of that would give us a standard deviation of 3.9, 8 times 10 to the five. We plug that into here to find our our G. Calculated. So our question of value is 6.1 times 10 to the six minus our average, divided by our standard deviation. So that gives me a G calculated equal to 2.04595. So R. G. Calculated Which is 2.04595. And let's just compare it to our g. Table at 95% confidence. So go on the previous page, look at that value. So G. Table when we're looking at seven measurements under 95% confidence, R. G table value is 2.2 zero. We see that our G calculated is larger. So what does that mean? That means? Therefore I have to disregard that value. So this value here is too high. So we have to disregard it. The reason it could be high since we're dealing with our immune system and white blood cells, it could be that she has maybe a cold and her body is just increasing the amount of white blood cells in order to combat um whatever um infectious disease she may have whatever it might be. So we all know how, hopefully we all know how the immune system works. So when we have an infection are white blood cell count spikes in order to combat whatever that infection maybe. So that would explain why her white blood cell for that day would be a little bit higher than normal. These would be her her average around of what you expect her white blood cells to be in terms of a typical day. So by using our Grubbs test, we can see that hey her white blood cell is higher than usual, maybe she's fighting some type of infection. So guys, hopefully you're able to follow along in terms of the Grubbs test, remember we have both the Grubbs test and the Q tests. Both look for the outlier within a given data set to see if it can be um disregarded or retained within our calculations in this example, because we have to disregard that val value, that would mean that you'd have to calculate, calculate a new standard deviation as well as a new average for this data set here, were not asked to figure that out. Were just asked to see if we have to ignore the 6.1 times 10 to the six, which we found out that we do because G calculated is larger than G table