BackStatistics Exam Study Guidance: Hypothesis Testing, ANOVA, Nonparametric Tests, Correlation, and Regression
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Q1. Why do we have/use non-parametric tests?
Background
Topic: Non-parametric Tests
This question is about understanding the purpose and rationale for using non-parametric statistical tests instead of parametric ones.
Key Terms:
Parametric tests: Statistical tests that assume the data follows a certain distribution (usually normal).
Non-parametric tests: Tests that do not require the data to follow a specific distribution.
Step-by-Step Guidance
Consider situations where the assumptions of parametric tests (like normality or equal variances) may not be met.
Think about the types of data (ordinal, nominal, or non-normally distributed interval/ratio data) where non-parametric tests are more appropriate.
Reflect on the advantages of non-parametric tests, such as their flexibility and fewer assumptions about the population.
Try answering in your own words before checking the answer!
Q2a. What does “ANOVA” stand for?
Background
Topic: ANOVA (Analysis of Variance)
This question is testing your knowledge of the acronym ANOVA and what it represents in statistics.
Key Terms:
ANOVA: A statistical method used to compare means across multiple groups.
Step-by-Step Guidance
Recall what each letter in "ANOVA" stands for.
Think about the main purpose of this statistical method.
Try recalling the full form before checking the answer!
Q2b. What is ANOVA used for?
Background
Topic: ANOVA Applications
This question asks you to explain the main use of ANOVA in statistical analysis.
Key Terms:
Means comparison: Determining if there are significant differences between group means.
Step-by-Step Guidance
Think about situations where you have more than two groups and want to compare their means.
Recall why using multiple t-tests is not ideal and how ANOVA addresses this issue.
Try to explain the purpose of ANOVA in your own words!
Q3. What does a “Goodness of fit” test help us determine?
Background
Topic: Goodness of Fit Test (Chi-Square)
This question is about understanding what a goodness of fit test is used for in statistics.
Key Terms:
Goodness of fit: A statistical test to see how well observed data match expected data under a specific hypothesis.
Chi-square test: The most common goodness of fit test.
Step-by-Step Guidance
Consider what it means to compare observed frequencies to expected frequencies.
Think about the types of hypotheses you might test with a goodness of fit test (e.g., uniform distribution, specific proportions).
Try to describe the purpose of a goodness of fit test before checking the answer!
Q4. Test the claim that the likelihood of earning an A in my class is the same for all rows (using the provided frequency table).
Background
Topic: Chi-Square Test for Homogeneity or Goodness of Fit
This question asks you to test whether the probability of earning an A is the same for students sitting in different rows, using observed frequencies.
Key Terms and Formulas:
Chi-square test statistic:
= observed frequency for row
= expected frequency for row (if all rows are equally likely)
Step-by-Step Guidance
State the null hypothesis: The likelihood of earning an A is the same for all rows.
Calculate the total number of students who earned an A (sum the frequencies for all rows).
Find the expected frequency for each row by dividing the total number of A's by the number of rows.
For each row, compute and sum these values to get the chi-square statistic.
Determine the degrees of freedom: .
Try calculating the expected frequencies and setting up the chi-square formula before moving on!
Q5. Test whether the reduction of blood pressure is dependent upon taking the drug (using the provided 2x2 table, significance level 0.01).
Background
Topic: Chi-Square Test of Independence
This question asks you to determine if there is a significant association between taking the drug and reduction in blood pressure.
Key Terms and Formulas:
Contingency table: A table showing frequencies for two categorical variables.
Chi-square test statistic:
= observed frequency in cell
= expected frequency in cell , calculated as
Step-by-Step Guidance
State the null hypothesis: There is no association between taking the drug and reduction in blood pressure.
Calculate the row totals, column totals, and grand total for the table.
Compute the expected frequency for each cell using the formula above.
Calculate for each cell and sum to get the chi-square statistic.
Determine the degrees of freedom: .
Try setting up the expected frequencies and the chi-square formula before proceeding!
Q6. Test the claim that the mean number of tickets given on the 4 streets is the same (using the provided data, significance level 0.10).
Background
Topic: One-Way ANOVA
This question asks you to use ANOVA to test if the mean number of tickets is equal across four different streets.
Key Terms and Formulas:
One-way ANOVA: Used to compare means of three or more groups.
F-statistic:
Sum of Squares Between (SSB):
Sum of Squares Within (SSW):
Step-by-Step Guidance
State the null hypothesis: The mean number of tickets is the same for all streets.
Calculate the mean for each street and the overall mean.
Compute the sum of squares between groups (SSB) and within groups (SSW).
Calculate the mean square between (MSB = SSB/df_between) and mean square within (MSW = SSW/df_within).
Set up the F-statistic formula using MSB and MSW.
Try calculating the group means and setting up the ANOVA table before moving on!
Q7. Use a sign test with a .10 significance level to test if ½ of college graduates work in a field related to their major (sample: 8 out of 24).
Background
Topic: Sign Test (Non-parametric Test for Median)
This question asks you to use the sign test to determine if the proportion of college graduates working in their field is different from 0.5.
Key Terms and Formulas:
Sign test: A non-parametric test for the median or proportion.
Binomial distribution: Used to calculate the probability of observing a certain number of successes under the null hypothesis.
Step-by-Step Guidance
State the null hypothesis: The probability that a graduate works in their field is 0.5.
Identify the number of "successes" (graduates working in their field) and "failures" (not working in their field).
Set up the binomial probability formula for the observed number of successes (or fewer/more, depending on the alternative hypothesis).
Determine the p-value using the binomial distribution.
Try setting up the binomial probability before moving on!
Q8. Use the Sign Test with a .05 significance level to test the claim that the training reduced sexual harassment claims (before/after data for 10 countries).
Background
Topic: Paired Sign Test
This question asks you to use the sign test to determine if there was a reduction in claims after training, using paired before/after data.
Key Terms and Formulas:
Paired data: Each country has a before and after value.
Sign test: Count the number of positive and negative differences.
Binomial probability: Used to calculate the p-value for the observed number of positive/negative signs.
Step-by-Step Guidance
For each country, determine if the number of claims decreased, increased, or stayed the same after training.
Count the number of negative signs (reductions) and positive signs (increases).
State the null hypothesis: The probability of a reduction is 0.5.
Set up the binomial probability for the observed number of reductions.
Try counting the signs and setting up the binomial test before proceeding!
Q9a. Use the Sign Test to test the claim that the median hourly rate charged by mechanic shops is greater than $59/hr (data provided).
Background
Topic: One-Sample Sign Test for Median
This question asks you to use the sign test to determine if the median hourly rate is greater than $59/hr.
Key Terms and Formulas:
Sign test: Compare each value to the hypothesized median and count the number of values above and below.
Binomial probability: Used to calculate the p-value for the observed number of values above/below the median.
Step-by-Step Guidance
For each hourly rate, determine if it is above, below, or equal to $59.
Count the number of rates above and below $59 (ignore ties).
State the null hypothesis: The median is $59/hr.
Set up the binomial probability for the observed number of rates above $59.
Try counting the signs and setting up the binomial test before moving on!
Q9b. Use the Wilcoxon Signed-Ranks Test to test the claim that the median hourly rate is greater than $59/hr (data provided).
Background
Topic: Wilcoxon Signed-Ranks Test
This question asks you to use the Wilcoxon Signed-Ranks Test, a non-parametric alternative to the paired t-test, to test the median hourly rate.
Key Terms and Formulas:
Wilcoxon Signed-Ranks Test: Uses the ranks of the absolute differences from the hypothesized median.
Test statistic (W): Sum of the ranks for positive or negative differences.
Step-by-Step Guidance
For each hourly rate, calculate the difference from $59 and note the sign.
Rank the absolute values of the non-zero differences (ignore ties for now).
Sum the ranks for positive and negative differences separately.
The test statistic is the smaller of the two sums of ranks.
Try ranking the differences and setting up the test statistic before proceeding!
Q10a. Create a scatter plot of the data with x=GPA and y=ACT.
Background
Topic: Scatter Plot and Correlation
This question asks you to visually represent the relationship between two quantitative variables.
Key Terms:
Scatter plot: A graph showing the relationship between two variables.
Step-by-Step Guidance
Plot each student's GPA on the x-axis and their ACT score on the y-axis.
Label the axes appropriately and plot all seven data points.
Try sketching the scatter plot before moving on!
Q10b. Run a hypothesis test to determine if there is correlation between GPA and ACT.
Background
Topic: Hypothesis Test for Correlation (Pearson's r)
This question asks you to test if there is a statistically significant linear relationship between GPA and ACT scores.
Key Terms and Formulas:
Pearson correlation coefficient (r):
t-test for correlation:
Step-by-Step Guidance
State the null hypothesis: There is no correlation () between GPA and ACT.
Calculate the means of GPA and ACT.
Compute the Pearson correlation coefficient using the formula above.
Calculate the t-statistic for the correlation and determine the degrees of freedom ().
Try calculating the correlation coefficient before moving on!
Q10c. What is the r-value for this data and what does it indicate about the correlation?
Background
Topic: Interpretation of Correlation Coefficient
This question asks you to interpret the value of the correlation coefficient you calculated in part (b).
Key Terms:
r-value: Measures the strength and direction of a linear relationship between two variables.
Step-by-Step Guidance
Recall the value of r you calculated in the previous step.
Interpret the sign (positive/negative) and magnitude (close to 0, moderate, strong) of r.
Try interpreting the r-value before checking the answer!
Q10d. Interpret the value of r² in the context of this problem.
Background
Topic: Coefficient of Determination (r²)
This question asks you to explain what the square of the correlation coefficient means in the context of GPA and ACT scores.
Key Terms:
r² (coefficient of determination): The proportion of variance in the dependent variable explained by the independent variable.
Step-by-Step Guidance
Square the r-value to get r².
Interpret r² as the percentage of variation in ACT scores explained by GPA.
Try interpreting r² in context before moving on!
Q10e. Create a linear regression equation, where x=GPA and y=ACT.
Background
Topic: Simple Linear Regression
This question asks you to find the equation of the best-fit line relating GPA to ACT scores.
Key Terms and Formulas:
Regression equation:
Slope (b):
Intercept (a):
Step-by-Step Guidance
Calculate the means of x (GPA) and y (ACT).
Compute the slope (b) using the formula above.
Calculate the intercept (a) using the mean values and the slope.
Try setting up the regression equation before moving on!
Q10f. What is the slope of this equation? Explain what this slope means in the context of this problem.
Background
Topic: Interpretation of Regression Slope
This question asks you to interpret the meaning of the slope in the regression equation relating GPA to ACT.
Key Terms:
Slope (b): The change in y (ACT) for a one-unit increase in x (GPA).
Step-by-Step Guidance
Recall the value of the slope from the regression equation.
Interpret the slope in terms of how much the ACT score is expected to change for each one-point increase in GPA.
Try interpreting the slope in context before moving on!
Q10g. Use the equation above to predict what a student with a GPA of 3.5 will get on his/her ACT.
Background
Topic: Prediction Using Regression Equation
This question asks you to use the regression equation to predict the ACT score for a given GPA.
Key Terms and Formulas:
Regression equation:
Step-by-Step Guidance
Plug x = 3.5 into your regression equation.
Calculate the predicted value of y (ACT score) using the equation.
Try plugging in the value and calculating the prediction before moving on!
Q10h. Create a 95% confidence interval for this prediction above, and interpret it.
Background
Topic: Confidence Interval for Regression Prediction
This question asks you to construct and interpret a confidence interval for the predicted ACT score at GPA = 3.5.
Key Terms and Formulas:
Standard error of prediction:
Confidence interval:
Step-by-Step Guidance
Calculate the standard error of the prediction at x = 3.5 using the formula above.
Find the appropriate t* value for a 95% confidence interval with n-2 degrees of freedom.
Set up the confidence interval using the predicted value and the standard error.
Try setting up the confidence interval before moving on!
Q10i. Predict the GPA of a student that scored an 8 on the ACT. Is this reliable? Why?
Background
Topic: Inverse Regression and Extrapolation
This question asks you to use the regression equation to predict GPA from ACT, and to consider the reliability of such a prediction.
Key Terms:
Extrapolation: Making predictions outside the range of observed data.
Step-by-Step Guidance
Rearrange the regression equation to solve for x (GPA) given y (ACT = 8).
Plug in y = 8 and solve for x.
Consider whether 8 is within the range of observed ACT scores and discuss the reliability of this prediction.