15: Basic Statistics
15.3 Compare samples with t-tests and analysis of variance - Video Tutorials & Practice Problems
<v Voiceover>Statistics</v> is all about comparisons. When you have data you wanna see how something is. Does the average make sense, is it significant, how do two groups compare to each other? To examine this we're gonna look at the Tips data that comes with the reshape two package. To get it we say, data tips package equals reshape two. Let's look at the data. We have total bill, tip, sex, smoker, day, time, size. Let's look at the different values for the sex of the server. Say unique, tips, sex. And we see it's a factor with two levels, female and male. Let's also look at how many days of the week there are. Say unique, tips, day. Really only has data for Friday, Saturday, Sunday, Thursday. The first test we're going to look at, is the one sample t test. This t test was drawn up by a man named William Gosset who was working at the Guinness brewery and needed a way to test samples of beer. So it has a long history. Whether or not the average tip is two dollars 50 cents. So we say t test we put in the variable of interest, which is tips tip, we say the alternative is two sided. That's because we are testing whether or not it's equal to we're not testing if it's greater than 2.50 we're seeing whether it's equal to 2.50. And the average we're testing against, is 2.5. We run this and we get a lot of information. Remember this is a hypothesis test, to see whether or not the average is equal to 2.5. That 2.5 is the null hypothesis. That means we're assuming that to be true unless proven otherwise. We get the T statistic which is a calculation that measures the ratio between the average and the standard error of the average. The degrees of freedom is a complicated subject, but it has to do with your number of observations and the number of parameters. And the P value's what tells you the story. This tells you whether or not you should reject the test. Now the problem with P values are what do you consider a significant P value? R. A Fisher, nearly 100 years ago, came up with three P values. .1, .05, .01 as the official cut offs, A lot of modern day statisticians don't like this. They think a p-value is arbitrary why cut it off at .05 what happens if you get .05111 and that's not significant? They don't really like that. That said, our p-value is very close to zero Cause there's a scientific notation showing 5.08 to the negative 08, that's like .00000005 so it's nearly zero. So in this situation we would reject the null hypothesis and conclude that the average tip is not 2.50$. The test also returns the confidence interval for the mean. Notice that the confidence interval goes from roughly 2.82 to 3.17. It does not include 2.5 in that interval. That's something to keep in mind. The t-distribution was created as an estimate of the normal distribution, when there's not a lot of data. Now our statistic, our t test statistic, falls somewhere along this distribution. To see where it falls we will compute a graph. So first thing let's load up GG Plot. Let's build out 30,000 observations from the random t distribution. So rand T gets rt 30,000 observations and the degrees of freedom will be, nrow, which in this case can default to length if needed, tips minus one. When you're using the t distribution, your degrees of freedom is the number of rows minus one because you are approximating one parameter, in this case a standard error. So we run this. And then we gotta figure out what the test statistic was. So we'll say tip t test gets t test we're essentially saving that t test as an object. So the variable we're judging is the tip column of tips, the alternative is two sided, and the mu was 2.5. So we save that. Now we're gonna build a distribution and see where our t statistic fell. So we'll say GG plot, we'll build a data frame, and we'll make that x equals rand t. Plus geom density, and aes will be x, and we will fill it with gray, and we will make the color gray as well. And then we're gonna put in vertical lines. We'll put in a vertical line at the value of the t statistic, so the x intercept, this is where it crosses the x axis how far left or right of the origin. Is tip t test dollar statistic. That gets us this t test. Close it off and we'll put in two more vertical lines. Geom v line, the x intercept, gets the mean of rand t, that's the center of the distribution, plus negative two and two, times the standard deviation of rand t. We'll tell it to make a line type of two. Running this, gets us this nice distribution. The gray area is the distribution itself, and the two vertical dashed lines shows roughly two standard deviations from the mean of the distribution. The black vertical line all the way on the right is our test statistic. This shows that it's nowhere near the rest of the distribution. This means that our result is highly unlikely if indeed the true value of the average was 2.5. And that's something to remember about the p value. The p value tells you how likely your results are, if the null hypothesis is true. So if you have a really small p value that means it's not very likely. So for us we had a p value of nearly zero, the probability of getting our data, if the true value for the mean was 2.5, is very low. Meaning we should probably reject the test. It's also possible to do a one sided t test. To see hey let's do a test is the true mean greater than 2.5 To do this we simply once again do t test, throw in the variable, and this time make the alternative greater. And again the mu will be 2.5. So we are testing if the mean is greater than 2.5. And here's our results. Another common use of the t test is the two sample t test. This allows you to test the average of two groups and see how they are. So to see this comparison let's first clear out the console. And then let's compare the tips of female workers to male workers. To do that we can very quickly aggregate it. And we will do this on tip based on sex, for the tips data and we're going to take the mean. We see it looks like males average 3.09$ while females average 2.83$. In order to do a t test between two groups, you typically want to have the variance of the two groups to be the same. So we should see if they are. We'll aggregate again based on sex, and we will find the variance. It looks like they're different, but we should really test them. There are tests to do this, some of the test require that the data be normally distributed. So first let's test for that. There's something called a Shapiro test, which if you feed it some data, it will tell you if it is normally distributed. And it looks like from what we see, we have a very low p value, it is not normally distributed. So let's check the individual groups, maybe as a whole they're not normally distributed, but individually they are. So I'm just gonna copy and past this, and I will subset this with tips sex equals female. And it appears we made a little mistake, because we did not capitalize female. Here you go, we still have the low p value, so the female tips aren't normally distributed. And just to be certain, we'll do a check on the male tip. Still low p value. So we can't use the normal test. But just to confirm this we can plot it and see what it looks like visually. And we see them overlaid on each other, neither one looks quite normally distributed. In fact they're kind of skewed. So we can't use a parametric test for equality of variances. So fortunately there's the non parametric, Ansari Bradly test. We call this by using ansari.test and feeding it, tip on sex from the tips data. So we see here that we have a high p value, meaning the variances are equal enough, so we don't need to resort to a Welch t test. You run this by doing, t test on tip on sex data equals tips, and we say variance is equal is true. If we said it was false it would run a Welsh t test. You run this and we get a p value that is above the cutoffs. Indicating that the tips between the two workers are relatively the same. Remember it's not enough to look at the averages, as we did before you need to look at the averages plus some variance. That's the only way you'll get the true story. As is often the case with me I prefer visualizing things. So rather than just taking these numbers at their word, lets go ahead and see how we could look at this. This will require loading the Plyr package, and we will clear out the console so we don't get distracted. Let's build a table holding a summary of the tips. We'll call it ddply since we're going from a data frame to a data frame, we'll call it on tips, and we're gonna break it up on the sex variable. One computer for the females, computer for the males. And we're gonna use a special plyr function called summarize. This lets us build out a lot of new information based on the data. So we'll say tip.mean gets the mean of tip, which is in the data frame. And we'll say tip.sd gets SD of tip. We'll also say lower gets tip.mean which we just created. Even though we just created it the special function lets us use it right away. I'll do two times tip.sd divided by the square root of nrow of tip. Basically this last part here got us the standard error. That's very important we want to measure two standard errors in either direction. And upper is very similar so we will just copy it, add the closing parentheses, and instead of minus two, we'll do plus two. So we run this. And we take a look. Here we now have for each sex, the mean, the standard deviation, the lower bound, and the upper bound. Now we can go ahead and plot this. So we have GG plot, tip summary, aes, the x gets tip.mean and y gets sex. We then add a geom point. And then we add something called geom error bar. Which draws lines of nice error bars. So see there's geom error bar, h, and in here the aesthetics are xmin gets the lower bound, that means the farthest to the left on the x axis. And xmax gets upper and we will make the bars have a height of .2. And we might have made a typo here we want error bar h. We run this, and we get a nice plot here, that shows the average for female, and the spread of the data, also the average for male and the spread of the data. Typically on these displays if you can draw a straight line through the confidence intervals of the two groups that generally means that the two groups are roughly equal. I personally prefer this graph over the numbers from the test. Another type of two sample t test is the paired t test. This is a situation where you have, two groups and in each group there's a parent from one person in one group through another person in another group. It doesn't just have to be people. It can be observations as well. So to do this we are going to load another package which has some data about fathers and sons called using r. Now lets look at father and son. We have the heights of fathers and the heights of sons. What we can do right here is a t test, one variable will be father son, father's height, the other variable, will be the son height. And we tell r that it is a paired t test. As we can see right here it's a low p value, saying that from one father to one son, there's generally not much of a difference in their heights. So according to this the son's aren't growing too much or shrinking too much compared to their fathers. Those are some good tests for when you have two samples. What happens when you have three samples or more, five, six, seven samples? A t test doesn't work. You need to use something called the ANOVA. Now the ANOVA is something that instills fear in many people who have learned statistics over the years. Whether in business or psychology, or any other program. It's formula looks like this. So this is the ANOVA, rather than making everyone upset and dive into this, suffice it to say you really don't have a need to learn this formula you just need to be able to apply it. Thankfully r gives us that capability. We want to test whether tips vary by day. So we will do this, tip ANOVA gets aov tip onto day minus one. We're calling it from the tips data. Now that minus one is important. Because the formula interface automatically puts on an intercept and we learn all about this in the modeling section. But you want to run an ANOVA test without an intercept. And we'll show you why in a second. Build a tip intercept and we'll build the same exact thing, but this time we won't take away that intercept. And let's compare them. You see tip, ANOVA, coefficients. We have values for Friday, Saturday, Sunday, Thursday, if we do tip intercept find the coefficients. We have an intercept and then values for Saturday, Sunday, Thursday. What that does and this is important because of matrix algebra, when you're dealing with a categorical variable, like day of the week, and you have four value of it, you can only really compute based on three, because of the intercept. Otherwise you get into a situation of multicollinerarity which we don't need to worry about. If you don't have an intercept, you can put all four values of the variable in the model. So back to our proper ANOVA without the intercept, we can do a summary of it. And we see here that it is saying, there's a p value that is close to zero. Meaning that at least one of the days is different than the others. It doesn't tell you which day is different, it tells you one of them is different, and it could be more than one, it just tells you there's at least more than one. So again I like visualization, so I think we should take a look at this. We build out a summary of tips by day, using dd ply. And we split it up on day, and we use the summarize function. So again tip.mean, gets mean of tip, and tip.SD gets SD of tip. So in addition to all the other stuff we usually calculate we say length equals nrow tip, since we're going to be using it repeatedly, we might as well draw it up once. Instead of multiplying by two as we did before for the t test, we're going to find the exact value from the t distribution that we should use. And say qt p equals .90, we're doing a 90% interval here, and the degrees of freedom is length minus one. Then we will say lower equals tip.mean, minus this tfrac, times the tip standard deviation, divided by square root of the length. That gives us the standard error. And then we will copy and paste this, to give us the positive side, the upper bound. You run this, and now we can plot it. We'll do gg plot, tips by day, aes, x equals tip.mean, y equals day, plus geom underscore point, plus geom underscore errorbarh, and here we give it the aesthetic again, xmin equals lower, xmax equals upper, and we give it a height of .3. We run this we see the different spreads here. From what we can tell is that the confidence interval for Sunday doesn't overlap with the confidence interval for Thursday. This indicates that there's the difference it found. Now some people might say when you get the results of an ANOVA if it's positive you should go through and do a pairwise t test between every group. This can become laborious and then you run into issues with multiple comparison. Some people say when you're doing repeated t tests, you have to worry about multiple comparisons, which is when you're doing the same test between different groups again, and again, and again, that can lead to statistical issues. Some people say you really have to worry about this, other people say you don't have to. It's a matter of debate in the community, and one that we're not going to resolve here. Now its up to you to decide how you want to account for your multiple comparisons. So in all this world of comparing groups, we have various types of t tests, for either a one sample comparison or a two sample comparison and we have the ANOVA for comparing multiple samples, personally I'm not the biggest fan of the ANOVA, I prefer testing groups using a regression, it is possible to recreate the idea of an ANOVA using a regression. That's because they were developed along the same lines however regression is far more powerful and can do much more. In your small sample when you have a few groups to compare the ANOVA and t test will be just right for you.