BackChapter 10: Correlation and Regression – Study Notes
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Correlation and Regression
Scatterplots & Correlation
Scatterplots are graphical representations of paired numerical data, where one variable is considered independent (x) and the other dependent (y). They are used to visually assess the relationship between two variables.
Linear Correlation: If the data points form a straight-line pattern, the variables are said to have a linear correlation.
Types of Correlation:
Positive Correlation: As x increases, y increases.
Negative Correlation: As x increases, y decreases.
No Correlation: No discernible pattern between x and y.
Correlation vs. Causation: Correlation does not imply causation; two variables may be related without one causing the other.
Example: Test scores vs. time spent studying often show a positive correlation, while test scores vs. number of siblings may show no correlation.
Creating Scatterplots with a Calculator
To create scatterplots using a graphing calculator (e.g., TI-84):
Enter data into lists (L1 for x-values, L2 for y-values).
Turn on STATPLOT and select the scatterplot option.
Adjust window settings to fit the data range.

Correlation Coefficient
Definition and Interpretation
The correlation coefficient (r) measures the strength and direction of the linear relationship between two variables.
Range:
Interpretation:
r close to 1: Strong positive linear correlation
r close to -1: Strong negative linear correlation
r close to 0: Weak or no linear correlation
The sign of r matches the direction (slope) of the trend.
Formula:
Example: If r = 0.96, there is a strong positive correlation; if r = -0.92, there is a strong negative correlation.
Finding the Correlation Coefficient with a Calculator
Enter data in L1 and L2.
Use the LinReg (ax+b) function in the CALC menu.
Read the value of r from the output.

Hypothesis Test for Correlation Coefficient
Testing the Significance of Correlation
A hypothesis test can determine if the observed correlation is statistically significant for the population.
Null Hypothesis (H0): (no correlation)
Alternative Hypothesis (Ha): , , or (depending on the research question)
Use the LinRegTTest function on a calculator to obtain the test statistic and p-value.
If p-value < significance level (α), reject H0.

Linear Regression Using the Least Squares Method
Least Squares Regression Line
The least squares regression line is the line that minimizes the sum of the squared vertical distances (residuals) between the observed values and the line.
Equation:
Residual:
How to Find:
Enter data in L1 (x) and L2 (y).
Use LinReg(ax+b) in the CALC menu.
Write down the slope (a) and intercept (b).

Predicting Values with the Regression Line
If correlation is strong and the x-value is within the data range, substitute x into the regression equation to predict y.
If correlation is weak or x is outside the data range, use the mean of y as the best estimate.
Residuals Analysis
Residual Plots
Residuals are used to assess the fit of a regression model. A residual plot displays the residuals on the vertical axis and the independent variable on the horizontal axis.
If residuals are randomly scattered, the linear model is appropriate.
If residuals show a pattern, the linear model may not be a good fit.
Formula:
Variation and the Coefficient of Determination
Coefficient of Determination (R2)
The coefficient of determination, , measures the proportion of the variance in the dependent variable that is predictable from the independent variable.
Formula:
R2 close to 1: Most variation is explained by the model.
R2 close to 0: Little variation is explained by the model.

Inferences for Slope of Regression Line
Hypothesis Test for Slope
To test if the slope of the regression line is significantly different from zero:
Null Hypothesis (H0):
Alternative Hypothesis (Ha): , , or
Use LinRegTTest on a calculator to obtain the test statistic and p-value.

Confidence Interval for Slope
A confidence interval for the slope provides a range of plausible values for the population slope.
Use LinRegTInt on a calculator to compute the interval.
If the interval does not include 0, there is evidence of a linear relationship.

Prediction Intervals
Definition and Calculation
A prediction interval estimates a range in which a single new observation is likely to fall, given a specified value of x.
Formula for Margin of Error (E):
Prediction interval:

Quadratic Regression
Quadratic Regression Model
When data shows a curved (nonlinear) pattern, a quadratic regression model may be more appropriate. The general form is:
Use QuadReg on a calculator to fit a quadratic model.
Compare R2 values for linear and quadratic models to determine the best fit.

Applications
Population growth, cost analysis, and other phenomena may be better modeled with quadratic regression.