Skip to main content
Back

Correlation and Linear Regression: Study Notes for Business Statistics

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Correlation and Linear Regression

Scatterplots & Correlation

Scatterplots are graphical representations of paired numerical data, where one variable is considered independent (x) and the other dependent (y). They are used to visually assess the relationship between two variables.

  • Correlation describes the degree to which two variables move together. If the data points form a straight-line pattern, the correlation is linear.

  • Positive Correlation: As x increases, y increases. The slope is positive.

  • Negative Correlation: As x increases, y decreases. The slope is negative.

  • No Correlation: No discernible pattern between x and y.

  • Nonlinear Correlation: The relationship is not a straight line.

  • Correlation does not imply causation; two variables may appear related without one causing the other.

Example: Test scores vs. time spent studying often show a positive correlation, while test scores vs. number of siblings may show no correlation.

Correlation Coefficient (r)

The correlation coefficient (r) quantifies the direction and strength of a linear relationship between two variables.

  • Range:

  • r > 0: Positive correlation

  • r < 0: Negative correlation

  • r = 0: No linear correlation

  • Strength: Values of r close to 1 or -1 indicate strong correlation; values near 0 indicate weak correlation.

  • The slope of the best-fit line does not affect the value of r.

Example: If , the correlation is strong and positive; if , it is strong and negative; if , it is weak.

Calculating the Correlation Coefficient

To calculate r using a TI-84 calculator:

  1. Turn diagnostics on (only needed once).

  2. Enter data in L1 (x-values) and L2 (y-values).

  3. Go to CALC > 4:LinReg(ax+b).

  4. Read the value of r from the output.

Calculator illustration

Linear Regression Using the Least Squares Method

Linear regression models the relationship between two variables with a straight line, minimizing the sum of squared vertical distances (residuals) from the data points to the line.

  • Regression Equation:

  • Residual: (the difference between observed and predicted values)

How to Find the Regression Line on TI-84:

  1. Enter data in L1 (x) and L2 (y).

  2. Go to CALC > 4:LinReg(ax+b).

  3. Write down the slope (a) and intercept (b).

  4. Plot the regression line using the calculator.

Calculator illustration

Predicting Values with the Regression Line

Use the regression equation to predict y-values for given x-values:

  • If correlation is strong and the x-value is within the data range, use the regression line for prediction.

  • If correlation is weak or the x-value is outside the data range, use the mean of y as the best estimate.

Example: Predict ice cream sales at a given temperature using the regression equation.

Residuals Analysis

Residuals help assess the fit of a regression model:

  • Random residuals: Model is a good fit.

  • Patterned residuals: Model is not a good fit; consider a different model.

  • Residual Plot: A graph of residuals versus x-values to check for randomness.

Variation and the Coefficient of Determination ()

The coefficient of determination () measures the proportion of variation in y explained by x through the regression model.

  • close to 1: Most variation is explained by the model.

  • close to 0: Little variation is explained by the model.

  • for simple linear regression.

Formula:

Calculator illustration

Inferences for Slope of Regression Line

Hypothesis Test for Slope

To test if there is a significant linear relationship between x and y:

  • Null Hypothesis (): (no linear relationship)

  • Alternative Hypothesis (): , , or (depending on the context)

  • Use the LinRegTTest function on a calculator to perform the test.

  • Compare the p-value to the significance level () to decide whether to reject .

Calculator illustrationCalculator illustration

Confidence Interval for Slope

A confidence interval estimates the range of plausible values for the slope of the regression line.

  • Use the LinRegTInt function on a calculator.

  • If the interval does not include 0, there is evidence of a linear relationship.

Calculator illustrationCalculator illustrationCalculator illustration

Prediction Intervals

A prediction interval gives a range for a single predicted y-value at a specific x, accounting for both the regression error and the variability of the data.

  • Formula for margin of error (E):

  • Prediction interval:

Calculator illustration

Quadratic Regression

When data shows a curved (nonlinear) pattern, a quadratic regression model may be more appropriate:

  • Quadratic Regression Equation:

  • Use technology (e.g., TI-84's QuadReg function) to fit the model.

  • Compare values for linear and quadratic models to determine the best fit.

Calculator illustrationCalculator illustration

Additional info: The calculator images are included only when directly relevant to the step-by-step instructions for statistical calculations, as per the provided guidelines.

Pearson Logo

Study Prep