A small business tracks how advertising spending relates to weekly sales. ( A ) \left(A\right) Plot Advertising Spending (x) vs Sales (y) & find the regression line & correlation coefficient.

y = 20.587 x + 2015.6 y=20.587x+2015.6 ; 0.985 0.985

y = 20.587 x + 2015.6 y=20.587x+2015.6 ; 0.971 0.971

y = 24.373 x y=24.373x ; 0.985 0.985

y = 24.373 x y=24.373x ; 0.971 0.971

A small business tracks how advertising spending relates to weekly sales. ( C ) \left(C\right) Create a prediction interval for the value in part ( B ) (B) .

( 9066.62 , 11434.18 ) \left(9066.62,11434.18\right)

( 9016.22 , 11383.78 ) (9016.22,11383.78)

( 8755.90 , 11644.10 ) (8755.90,11644.10)

( 8806.30 , 11694.50 ) (8806.30,11694.50)

12. Regression

Prediction Intervals - Excel

12. Regression

Prediction Intervals - Excel: Videos & Practice Problems Bonus

Video Lessons

Topic summary

Prediction intervals extend regression analysis by providing a confidence interval for predicted y values, incorporating the margin of error calculated using the critical t-value, standard error of the estimate (se), and variability in x. Valid prediction requires strong linear correlation (high R²) and predictions within the data range. This method enhances inferential statistics by quantifying uncertainty in predictions, crucial for accurate forecasting and decision-making in business analytics and regression modeling.

Downloads & Resources

concept

Prediction Intervals - Excel

Video duration:

Prediction Intervals - Excel Video Summary

Regression analysis allows us to predict values of a dependent variable y based on an independent variable x, but these predictions are estimates rather than exact values. To quantify the uncertainty around these predictions, we use prediction intervals, which are similar to confidence intervals but specifically apply to predicted y values from a regression model.

Consider a scenario where a public transportation association studies the relationship between temperature and the number of bus riders. The data shows a strong linear correlation with an r² value of 0.874 and a standard error of the estimate, s_e, equal to 2.97. The regression line equation is given, allowing us to predict the number of riders for a specific temperature.

To predict the number of riders when the temperature is 35 degrees, substitute x₀ = 35 into the regression equation:

\[\hat{y}_0 = b_0 + b_1 x_0\]

For example, if the regression equation is $\hat{y} = 79.143 - 1.0459x$, then:

\[\hat{y}_0 = 79.143 - 1.0459 \times 35 = 42.54\]

This means the best estimate for the number of riders at 35 degrees is approximately 42.54.

Before constructing a 95% prediction interval for this estimate, two conditions must be verified: there must be a strong linear correlation between x and y, and the prediction point x₀ must lie within the range of observed data to avoid unreliable extrapolation. In this case, both conditions are satisfied.

The prediction interval is calculated as:

\[\hat{y}_0 \pm t_{\alpha/2, n-2} \times s_e \sqrt{1 + \frac{1}{n} + \frac{(x_0 - \bar{x})^2}{\sum (x_i - \bar{x})^2}}\]

Here, $t_{\alpha/2, n-2}$ is the critical value from the t-distribution with $n-2$ degrees of freedom, corresponding to the desired confidence level (e.g., 95%), $s_e$ is the standard error of the estimate, $n$ is the number of data points, $\bar{x}$ is the mean of the observed x values, and the denominator is the sum of squared deviations of x values from their mean.

To find the critical t-value for a 95% prediction interval with 13 data points, calculate:

\[t_{0.025, 11} \approx 2.201\]

Using the data, the mean temperature $\bar{x}$ is approximately 29.92, the sum of the x values is 389, and the sum of squares of the x values is 12,255. The numerator and denominator inside the square root are computed as:

\[\text{Numerator} = n (x_0 - \bar{x})^2 = 13 \times (35 - 29.92)^2 = 335.1\]\[\text{Denominator} = n \sum x_i^2 - (\sum x_i)^2 = 13 \times 12,255 - 389^2 = 7,994\]

Substituting these values into the margin of error formula:

\[E = t_{\alpha/2, n-2} \times s_e \times \sqrt{1 + \frac{1}{n} + \frac{\text{Numerator}}{\text{Denominator}}} = 2.201 \times 2.97 \times \sqrt{1 + \frac{1}{13} + \frac{335.1}{7,994}} \approx 6.914\]

Finally, the 95% prediction interval for the number of riders at 35 degrees is:

\[( \hat{y}_0 - E, \hat{y}_0 + E ) = (42.54 - 6.914, 42.54 + 6.914) = (35.62, 49.45)\]

This interval means we are 95% confident that the actual number of bus riders when the temperature is 35 degrees will fall between approximately 35.62 and 49.45.

Understanding prediction intervals enhances the interpretation of regression predictions by quantifying the expected variability around predicted values. Tools like Excel simplify the calculation process, especially for complex formulas involving sums and critical values, making it easier to apply these concepts in practical data analysis.

Problem

A small business tracks how advertising spending relates to weekly sales.
$(A)$ Plot Advertising Spending (x) vs Sales (y) & find the regression line & correlation coefficient.

$y = 20.587 x + 2015.6$ ; $0.971$

$y = 20.587 x + 2015.6$ ; $0.985$

$y equals 24.373 x$ ; $0.985$

$y equals 24.373 x$ ; $0.971$

Problem

A small business tracks how advertising spending relates to weekly sales.
$(B)$ Predict what the business would make in weekly sales if they spent \$400 in advertising.

10,200

10,250

8,235

9,749

Problem

A small business tracks how advertising spending relates to weekly sales.
$(C)$ Create a prediction interval for the value in part $(B)$ .

$open paren 9016.22 comma 11383.78 close paren$

$open paren 8755.90 comma 11644.10 close paren$

$open paren 9066.62 comma 11434.18 close paren$

$open paren 8806.30 comma 11694.50 close paren$

Problem

A small business tracks how advertising spending relates to weekly sales.
$(D)$ Are the actual sales numbers for weeks with \$400 ad-spending within the interval?

Yes

More info is required.

Do you want more practice?

Here’s what students ask on this topic:

A prediction interval in regression analysis provides a range within which we expect a single new observation of the dependent variable (y) to fall, given a specific value of the independent variable (x). It accounts for both the uncertainty in estimating the regression line and the natural variability of individual data points around that line. In contrast, a confidence interval estimates the range for the mean value of y at a given x, focusing on the average response rather than individual predictions. Prediction intervals are wider than confidence intervals because they include extra variability from individual observations. This distinction is important when making forecasts or decisions based on regression models, as prediction intervals give a more realistic range for future data points.

To calculate a 95% prediction interval in Excel, first find the predicted y value ($\hat{y}_0$) by plugging the x value into the regression equation. Then, calculate the margin of error (e) using the formula involving the critical t-value ($t_{\alpha/2}$), standard error of the estimate ($s_e$), and the variability of x values. Use Excel functions like =T.INV.2T(0.05, n-2) for the t critical value, =AVERAGE(range) for the mean of x, and =SUMSQ(range) for the sum of squares of x. Finally, compute the lower and upper bounds as $\hat{y}_0 - e$ and $\hat{y}_0 + e$. This interval gives a 95% confidence range for the predicted y value at the specified x.

Before creating a prediction interval, two key conditions must be met: (1) There must be a strong linear correlation between the independent variable (x) and dependent variable (y), typically indicated by a high R-squared value close to 1. This ensures the regression model fits the data well. (2) The x value for which the prediction is made must lie within the range of the observed data. Predicting outside this range (extrapolation) can lead to unreliable intervals because the model may not hold beyond the observed data. Meeting these conditions ensures the prediction interval accurately reflects the uncertainty in the predicted y value.

Interpreting a prediction interval means understanding the range within which the actual number of bus riders is expected to fall for a given temperature, with a specified level of confidence. For example, if a 95% prediction interval for bus ridership at 35 degrees is from 35.62 to 49.45 riders, it means we are 95% confident that the actual number of riders on a day with 35 degrees temperature will be between approximately 36 and 49. This interval accounts for both the uncertainty in the regression estimate and the natural variability in ridership, providing a realistic range for decision-making or planning.

The t critical value is used instead of the z value because prediction intervals rely on the t-distribution, which accounts for the additional uncertainty when estimating the population standard deviation from a sample. The t-distribution is wider and more appropriate for smaller sample sizes, reflecting greater variability. In regression, degrees of freedom are $n - 2$ (where $n$ is the number of data points), and the t critical value adjusts for this. Using the t critical value ensures the prediction interval accurately reflects the uncertainty in the estimate, especially when the sample size is not very large.

Your Statistics tutors

Patrick Ford

Physics and Math Lead Instructor

Prediction Intervals - Excel: Videos & Practice Problems Bonus

Downloads & Resources

Prediction Intervals - Excel

Prediction Intervals - Excel Video Summary

A small business tracks how advertising spending relates to weekly sales.
$(A)$ Plot Advertising Spending (x) vs Sales (y) & find the regression line & correlation coefficient.

A small business tracks how advertising spending relates to weekly sales.
$(B)$ Predict what the business would make in weekly sales if they spent \$400 in advertising.

A small business tracks how advertising spending relates to weekly sales.
$(C)$ Create a prediction interval for the value in part $(B)$ .

A small business tracks how advertising spending relates to weekly sales.
$(D)$ Are the actual sales numbers for weeks with \$400 ad-spending within the interval?

Do you want more practice?

Here’s what students ask on this topic:

What is a prediction interval in regression analysis and how is it different from a confidence interval?

How do you calculate a 95% prediction interval for a predicted y value using Excel?

What conditions must be met before creating a prediction interval in regression?

How do you interpret the results of a prediction interval in the context of bus ridership and temperature?

Why is the t critical value used in calculating prediction intervals instead of the z value?

Your Statistics tutors

Prediction Intervals - Excel: Videos & Practice Problems Bonus

Prediction Intervals - Excel

Prediction Intervals - Excel Video Summary

A small business tracks how advertising spending relates to weekly sales.(A)\(\left\)(A\(\right\)) Plot Advertising Spending (x) vs Sales (y) & find the regression line & correlation coefficient.

A small business tracks how advertising spending relates to weekly sales.(B)\(\left\)(B\(\right\)) Predict what the business would make in weekly sales if they spent \$400 in advertising.

A small business tracks how advertising spending relates to weekly sales.(C)\(\left\)(C\(\right\)) Create a prediction interval for the value in part (B)(B).

A small business tracks how advertising spending relates to weekly sales.(D)\(\left\)(D\(\right\)) Are the actual sales numbers for weeks with \$400 ad-spending within the interval?

Do you want more practice?

Here’s what students ask on this topic:

What is a prediction interval in regression analysis and how is it different from a confidence interval?

What is a prediction interval in regression analysis and how is it different from a confidence interval?

How do you calculate a 95% prediction interval for a predicted y value using Excel?

How do you calculate a 95% prediction interval for a predicted y value using Excel?

What conditions must be met before creating a prediction interval in regression?

What conditions must be met before creating a prediction interval in regression?

How do you interpret the results of a prediction interval in the context of bus ridership and temperature?

How do you interpret the results of a prediction interval in the context of bus ridership and temperature?

Why is the t critical value used in calculating prediction intervals instead of the z value?

Why is the t critical value used in calculating prediction intervals instead of the z value?

Your Statistics tutors

A small business tracks how advertising spending relates to weekly sales.
$(A)$ Plot Advertising Spending (x) vs Sales (y) & find the regression line & correlation coefficient.

A small business tracks how advertising spending relates to weekly sales.
$(B)$ Predict what the business would make in weekly sales if they spent \$400 in advertising.

A small business tracks how advertising spending relates to weekly sales.
$(C)$ Create a prediction interval for the value in part $(B)$ .

A small business tracks how advertising spending relates to weekly sales.
$(D)$ Are the actual sales numbers for weeks with \$400 ad-spending within the interval?