BackStatistics for Business: Exam 2 Review Study Notes
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Exam 2 Review: Key Topics in Statistics for Business
Announcements and Exam Structure
This section outlines important deadlines and exam policies for DS 101, including assignment due dates and procedures for excused absences. Understanding these logistics ensures proper exam preparation and adherence to course requirements.
Assignments: Chapters 21 and 22 assignments are due this week.
Exam Coverage: Exam 2 covers Chapters 18, 6, 19, 21, and 22.
Excused Absence Policy: If excused from an exam, the final exam's weight increases by the percentage of the missed exam.
Chapter 18: Inference for Counts (Excel)
Chi-Square Test for Independence
The Chi-Square Test for Independence is used to determine whether two categorical variables are independent. This test is commonly applied to contingency tables.
Degrees of Freedom (df): Calculated as , where is the number of rows and is the number of columns.
Expected Count: For any cell,
Chi-Square Test Statistic: , where is the observed count and is the expected count.
p-value: The probability of observing a test statistic as extreme as, or more extreme than, the value calculated, under the null hypothesis. Calculated using the chi-square distribution with the appropriate degrees of freedom.
Excel Formula: = CHISQ.DIST.RT(Chi-Square Test Statistic, degrees of freedom)
Example Table: Household Income vs. Area
Area | Less than $25,000 | $25,000 to $49,999 | $50,000 to $74,999 | $75,000 to $99,999 | $100,000 or more |
|---|---|---|---|---|---|
Area 1 | 26 | 38 | 76 | 57 | 50 |
Area 2 | 72 | 76 | 73 | 34 | 53 |
Total | 98 | 114 | 149 | 91 | 103 |
Calculation Example:
Test of Goodness of Fit
The Goodness of Fit Test evaluates whether observed categorical data match an expected distribution. It is often used to test if events occur with equal probability.
Degrees of Freedom: , where is the number of categories.
Expected Count: If all categories are equally likely,
Chi-Square Test Statistic:
p-value: Calculated using the chi-square distribution.
Excel Formula: =CHISQ.DIST.RT(test statistic, degrees of freedom)
Example Table: Call Center Data
Day | Observed Calls |
|---|---|
Monday | 120 |
Tuesday | 95 |
Wednesday | 110 |
Thursday | 105 |
Friday | 130 |
Saturday | 75 |
Sunday | 65 |
Total | 700 |
Expected Calls per Day:
Chapter 6: Association Between Quantitative Variables
Interpreting Correlation from a Scatterplot
Correlation measures the strength and direction of a linear relationship between two quantitative variables. Scatterplots visually display this relationship.
Positive Correlation (): Points slope upward from left to right.
Negative Correlation (): Points slope downward from left to right.
No Correlation (): Points show no clear pattern.
Strength: close to 1 indicates strong correlation; near 0 indicates weak or no correlation.
Important Note: Correlation does not imply causation.
Measuring Association: Covariance and Correlation
Covariance and correlation are statistical measures used to quantify the relationship between two quantitative variables.
Covariance: Measures the direction of the linear relationship. Depends on units, making interpretation difficult.
Formula:
Correlation: Standardized measure of linear association, ranging from -1 to +1.
Formula: , where and are the standard deviations of and .
Chapter 19 & 21: Linear Patterns and Simple Linear Regression
Simple Linear Regression Concepts
Simple linear regression models the relationship between a quantitative explanatory variable () and a quantitative response variable () using a straight line.
Scatterplot: on the horizontal axis, on the vertical axis.
Linear Trend: Points roughly form a straight line.
Regression Equation: , where is the intercept and is the slope.
Residuals: Differences between observed and predicted values; their variability is measured by the standard error of the regression.
Coefficient of Determination (): is the square of the correlation between and , representing the fraction of variation explained by the regression.
Simple Linear Regression: Inference and Confidence Intervals
Statistical inference in regression involves estimating parameters and testing hypotheses about the relationship between variables.
Conditions: Linearity, independence, normality, and equal variance of residuals.
95% Confidence Interval for Intercept and Slope: Provides a range of plausible values for the parameters.
Hypothesis Testing: Tests whether the slope or intercept is significantly different from zero.
Prediction Interval: Estimates the range for a new observation at a given value.
Chapter 22: Regression Diagnostics
Regression Diagnostics and Residual Analysis
Regression diagnostics assess the validity of the regression model and identify potential issues such as outliers or autocorrelation.
Time Series Plot for Residuals: Used to detect patterns or autocorrelation in residuals.
Durbin-Watson Statistic: Measures autocorrelation in residuals; values near 2 indicate no autocorrelation.
Effect of Outliers: Outliers can distort regression results; consider whether to remove them based on their impact and context.
Additional info: These notes expand on brief points from the original materials, providing full definitions, formulas, and context for each statistical concept relevant to a college-level Statistics for Business course.