Skip to main content
Back

Inference on the Least-Squares Regression Model: Hypothesis Testing, Confidence Intervals, and Model Assumptions

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Inference on the Least-Squares Regression Model

Requirements of the Least-Squares Regression Model

The least-squares regression model is a foundational tool in statistics for modeling the linear relationship between an explanatory variable (x) and a response variable (y). For valid inference, several requirements must be met:

  • Linearity: For any value of x, the mean of the response variable y depends linearly on x. This is expressed as .

  • Normality and Constant Variance: The response variable y is normally distributed with a constant standard deviation for each value of x. The mean changes at a constant rate (the slope), while the standard deviation remains constant.

  • Independence: The error terms are independent and have mean zero and constant variance .

Graphical depiction of the regression model with normal distributions at different x values

Interpretation: A large value of indicates data are widely dispersed about the regression line, while a small means data are close to the line.

The Least-Squares Regression Model Equation

  • The model is given by , where is a random error term with mean 0 and variance .

  • Parameters (intercept) and (slope) are estimated from sample data.

Standard Error of the Estimate

The standard error of the estimate () measures the typical distance that the observed values fall from the regression line. It is calculated as:

  • Where is the observed value, is the predicted value, and is the sample size.

Table of observed, predicted values, residuals, and squared residuals

Example: For the drilling data, and , so .

Note: Always divide by when computing for regression.

Verifying Normality of Residuals

To ensure valid inference, the residuals (differences between observed and predicted values) should be approximately normally distributed. This can be checked using a normal probability plot.

Probability plot of residuals for normality check

Interpretation: If the residuals fall approximately along a straight line in the probability plot, the normality assumption is reasonable.

Hypothesis Testing for the Slope

To determine if there is a significant linear relationship between x and y, we test hypotheses about the slope :

  • Null hypothesis (): (no linear relationship)

  • Alternative hypothesis (): , , or (depending on the research question)

The test statistic is:

  • (since under )

This statistic follows a Student's t-distribution with degrees of freedom.

Critical Value Approach

Compare the calculated to the critical value(s) from the t-distribution:

Critical region for two-tailed t-testCritical region for left-tailed t-testCritical region for right-tailed t-test

Test Type

Decision Rule

Two-Tailed

If or , reject .

Left-Tailed

If , reject .

Right-Tailed

If , reject .

Table of rejection regions for t-tests

P-Value Approach

The P-value is the probability, under , of observing a test statistic as extreme as, or more extreme than, the value computed from the sample.

  • For a two-tailed test, the P-value is the sum of the areas in both tails beyond .

  • For a left-tailed test, it is the area to the left of .

  • For a right-tailed test, it is the area to the right of .

P-value as sum of tail areas in two-tailed testP-value as left tail area in left-tailed testP-value as right tail area in right-tailed test

Decision Rule: If the P-value < , reject .

Checking Model Assumptions with Residual Plots

Before conducting inference, check that the residuals show no pattern when plotted against the explanatory variable. This supports the assumption of constant variance (homoscedasticity).

Residuals versus explanatory variable plot

Interpretation: No discernable pattern indicates the model assumptions are reasonable.

Calculating the Standard Error of the Slope

The standard error of the slope estimate is calculated as:

Table of (x_i - x̄) and squared deviations

Example: For the drilling data, and , so .

Constructing a Confidence Interval for the Slope

A confidence interval for the true slope is:

  • Lower bound:

  • Upper bound:

Example: For , , (10 df):

  • Lower bound:

  • Upper bound:

We are 95% confident that the true mean increase in time to drill 5 feet for each additional foot of depth is between 0.005 and 0.018 minutes.

Confidence and Prediction Intervals for the Regression Model

These intervals provide measures of accuracy for predictions made using the regression model:

  • Confidence Interval for Mean Response: Estimates the mean value of y for all individuals at a specific x.

  • Prediction Interval for Individual Response: Estimates the value of y for a single individual at a specific x.

Formulas:

  • Confidence interval for mean response at : Lower: Upper:

  • Prediction interval for individual response at : Lower: Upper:

Example: For , , , , , , :

  • Confidence interval for mean response: (6.45, 7.15)

  • Prediction interval for individual response: (5.59, 8.01)

We are 95% confident that the mean time to drill 5 feet at 110 feet depth is between 6.45 and 7.15 minutes, and for a single drilling, the time is between 5.59 and 8.01 minutes.

Pearson Logo

Study Prep