Statistics for Business: Exam 2 Review Study Notes

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Exam 2 Review: Key Topics in Statistics for Business

Announcements and Exam Structure

This section outlines important deadlines and exam policies for DS 101, including assignment due dates and procedures for excused absences. Understanding these logistics ensures proper exam preparation and adherence to course requirements.

Assignments: Chapters 21 and 22 assignments are due this week.
Exam Coverage: Exam 2 covers Chapters 18, 6, 19, 21, and 22.
Excused Absence Policy: If excused from an exam, the final exam's weight increases by the percentage of the missed exam.

Chapter 18: Inference for Counts (Excel)

Chi-Square Test for Independence

The Chi-Square Test for Independence is used to determine whether two categorical variables are independent. This test is commonly applied to contingency tables.

Degrees of Freedom (df): Calculated as , where is the number of rows and is the number of columns.
Expected Count: For any cell,
Chi-Square Test Statistic: , where is the observed count and is the expected count.
p-value: The probability of observing a test statistic as extreme as, or more extreme than, the value calculated, under the null hypothesis. Calculated using the chi-square distribution with the appropriate degrees of freedom.
Excel Formula: = CHISQ.DIST.RT(Chi-Square Test Statistic, degrees of freedom)

Example Table: Household Income vs. Area

Area	Less than $25,000	$25,000 to $49,999	$50,000 to $74,999	$75,000 to $99,999	$100,000 or more
Area 1	26	38	76	57	50
Area 2	72	76	73	34	53
Total	98	114	149	91	103

Calculation Example:

Test of Goodness of Fit

The Goodness of Fit Test evaluates whether observed categorical data match an expected distribution. It is often used to test if events occur with equal probability.

Degrees of Freedom: , where is the number of categories.
Expected Count: If all categories are equally likely,
Chi-Square Test Statistic:
p-value: Calculated using the chi-square distribution.
Excel Formula: =CHISQ.DIST.RT(test statistic, degrees of freedom)

Example Table: Call Center Data

Day	Observed Calls
Monday	120
Tuesday	95
Wednesday	110
Thursday	105
Friday	130
Saturday	75
Sunday	65
Total	700

Expected Calls per Day:

Chapter 6: Association Between Quantitative Variables

Interpreting Correlation from a Scatterplot

Correlation measures the strength and direction of a linear relationship between two quantitative variables. Scatterplots visually display this relationship.

Positive Correlation (): Points slope upward from left to right.
Negative Correlation (): Points slope downward from left to right.
No Correlation (): Points show no clear pattern.
Strength: close to 1 indicates strong correlation; near 0 indicates weak or no correlation.
Important Note: Correlation does not imply causation.

Measuring Association: Covariance and Correlation

Covariance and correlation are statistical measures used to quantify the relationship between two quantitative variables.

Covariance: Measures the direction of the linear relationship. Depends on units, making interpretation difficult.
Formula:
Correlation: Standardized measure of linear association, ranging from -1 to +1.
Formula: , where and are the standard deviations of and .

Chapter 19 & 21: Linear Patterns and Simple Linear Regression

Simple Linear Regression Concepts

Simple linear regression models the relationship between a quantitative explanatory variable () and a quantitative response variable () using a straight line.

Scatterplot: on the horizontal axis, on the vertical axis.
Linear Trend: Points roughly form a straight line.
Regression Equation: , where is the intercept and is the slope.
Residuals: Differences between observed and predicted values; their variability is measured by the standard error of the regression.
Coefficient of Determination (): is the square of the correlation between and , representing the fraction of variation explained by the regression.

Simple Linear Regression: Inference and Confidence Intervals

Statistical inference in regression involves estimating parameters and testing hypotheses about the relationship between variables.

Conditions: Linearity, independence, normality, and equal variance of residuals.
95% Confidence Interval for Intercept and Slope: Provides a range of plausible values for the parameters.
Hypothesis Testing: Tests whether the slope or intercept is significantly different from zero.
Prediction Interval: Estimates the range for a new observation at a given value.

Chapter 22: Regression Diagnostics

Regression Diagnostics and Residual Analysis

Regression diagnostics assess the validity of the regression model and identify potential issues such as outliers or autocorrelation.

Time Series Plot for Residuals: Used to detect patterns or autocorrelation in residuals.
Durbin-Watson Statistic: Measures autocorrelation in residuals; values near 2 indicate no autocorrelation.
Effect of Outliers: Outliers can distort regression results; consider whether to remove them based on their impact and context.

Additional info: These notes expand on brief points from the original materials, providing full definitions, formulas, and context for each statistical concept relevant to a college-level Statistics for Business course.