BackInference on the Least-Squares Regression Model: Hypothesis Testing, Confidence Intervals, and Model Assumptions
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Inference on the Least-Squares Regression Model
Requirements of the Least-Squares Regression Model
The least-squares regression model is a foundational tool in statistics for modeling the linear relationship between an explanatory variable (x) and a response variable (y). For valid inference, several requirements must be met:
Linearity: For any value of x, the mean of the response variable y depends linearly on x. This is expressed as .
Normality and Constant Variance: The response variable y is normally distributed with a constant standard deviation for each value of x. The mean changes at a constant rate (the slope), while the standard deviation remains constant.
Independence: The error terms are independent and have mean zero and constant variance .

Interpretation: A large value of indicates data are widely dispersed about the regression line, while a small means data are close to the line.
The Least-Squares Regression Model Equation
The model is given by , where is a random error term with mean 0 and variance .
Parameters (intercept) and (slope) are estimated from sample data.
Standard Error of the Estimate
The standard error of the estimate () measures the typical distance that the observed values fall from the regression line. It is calculated as:
Where is the observed value, is the predicted value, and is the sample size.

Example: For the drilling data, and , so .
Note: Always divide by when computing for regression.
Verifying Normality of Residuals
To ensure valid inference, the residuals (differences between observed and predicted values) should be approximately normally distributed. This can be checked using a normal probability plot.

Interpretation: If the residuals fall approximately along a straight line in the probability plot, the normality assumption is reasonable.
Hypothesis Testing for the Slope
To determine if there is a significant linear relationship between x and y, we test hypotheses about the slope :
Null hypothesis (): (no linear relationship)
Alternative hypothesis (): , , or (depending on the research question)
The test statistic is:
(since under )
This statistic follows a Student's t-distribution with degrees of freedom.
Critical Value Approach
Compare the calculated to the critical value(s) from the t-distribution:



Test Type | Decision Rule |
|---|---|
Two-Tailed | If or , reject . |
Left-Tailed | If , reject . |
Right-Tailed | If , reject . |

P-Value Approach
The P-value is the probability, under , of observing a test statistic as extreme as, or more extreme than, the value computed from the sample.
For a two-tailed test, the P-value is the sum of the areas in both tails beyond .
For a left-tailed test, it is the area to the left of .
For a right-tailed test, it is the area to the right of .



Decision Rule: If the P-value < , reject .
Checking Model Assumptions with Residual Plots
Before conducting inference, check that the residuals show no pattern when plotted against the explanatory variable. This supports the assumption of constant variance (homoscedasticity).

Interpretation: No discernable pattern indicates the model assumptions are reasonable.
Calculating the Standard Error of the Slope
The standard error of the slope estimate is calculated as:

Example: For the drilling data, and , so .
Constructing a Confidence Interval for the Slope
A confidence interval for the true slope is:
Lower bound:
Upper bound:
Example: For , , (10 df):
Lower bound:
Upper bound:
We are 95% confident that the true mean increase in time to drill 5 feet for each additional foot of depth is between 0.005 and 0.018 minutes.
Confidence and Prediction Intervals for the Regression Model
These intervals provide measures of accuracy for predictions made using the regression model:
Confidence Interval for Mean Response: Estimates the mean value of y for all individuals at a specific x.
Prediction Interval for Individual Response: Estimates the value of y for a single individual at a specific x.
Formulas:
Confidence interval for mean response at : Lower: Upper:
Prediction interval for individual response at : Lower: Upper:
Example: For , , , , , , :
Confidence interval for mean response: (6.45, 7.15)
Prediction interval for individual response: (5.59, 8.01)
We are 95% confident that the mean time to drill 5 feet at 110 feet depth is between 6.45 and 7.15 minutes, and for a single drilling, the time is between 5.59 and 8.01 minutes.