Skip to main content
Back

Statistical Inference: Comparing Two Populations and Linear Regression

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Statistical Inference: Comparing Two Populations and Linear Regression

Overview

This unit focuses on the application of statistical inference methods to analyze single variables and the relationship between two variables. Students will learn to summarize data, perform hypothesis tests, construct confidence intervals, and interpret results, both manually and using statistical software (RStudio). The unit also covers linear regression modeling and the interpretation of regression results.

Comparing Two Populations

Types of Samples

  • Independent Samples: Two samples are independent if the selection of individuals in one sample does not influence the selection in the other.

  • Dependent (Paired) Samples: Samples are paired if each observation in one sample can be paired with an observation in the other sample (e.g., before-and-after measurements on the same subjects).

Hypothesis Testing for Means and Proportions

  • Single Mean of Differences (Paired Data): Used when comparing means from dependent samples.

  • Difference Between Two Independent Means: Used to compare means from two independent groups.

  • Difference Between Two Independent Proportions: Used to compare proportions from two independent groups.

Formulating Hypotheses

  • Null Hypothesis (H0): States that there is no difference (e.g., or ).

  • Alternative Hypothesis (HA): States that there is a difference (e.g., or ).

Test Statistics

  • For Paired Means:

  • For Difference of Two Means (Independent):

  • For Difference of Two Proportions:

Confidence Intervals

  • For Paired Means:

  • For Difference of Two Means:

  • For Difference of Two Proportions:

Conclusion and Interpretation

  • Draw conclusions based on the p-value and significance level ().

  • State results in context, avoiding unnecessary statistical jargon.

Example

  • Suppose we want to test if a new teaching method affects test scores. We collect scores before and after the method is applied to the same students (paired data). We calculate the mean difference, standard deviation, and use the paired t-test formula to test for significance.

Using RStudio for Inference

  • prop.test: Used for testing proportions.

  • t.test: Used for testing means (paired or independent samples).

  • RStudio can compute confidence intervals, test statistics, and exact p-values for the above scenarios.

Linear Regression Analysis

Least Squares Regression Line

Linear regression models the relationship between a numerical response variable and a single explanatory variable. The least squares regression line minimizes the sum of squared residuals.

  • Regression Equation:

  • Slope (): Represents the estimated change in the response variable for a one-unit increase in the explanatory variable.

  • Intercept (): The estimated value of the response variable when the explanatory variable is zero.

Prediction and Residuals

  • Prediction: Use the regression equation to estimate the response for a given value of the explanatory variable.

  • Residual: The difference between the observed value and the predicted value:

Coefficient of Determination ()

  • Definition: measures the proportion of variability in the response variable explained by the explanatory variable.

  • Formula:

  • Interpretation: An value close to 1 indicates a strong linear relationship.

Example

  • Given data on students' study hours and exam scores, fit a regression line, calculate residuals, and interpret the slope, intercept, and value.

Summary Table: Key Inference Procedures

Procedure

Sample Type

Parameter

Test Statistic

Software Function

Paired t-test

Dependent (paired)

Mean of differences ()

t.test(..., paired=TRUE)

Two-sample t-test

Independent

Difference of means ()

t.test(...)

Two-proportion z-test

Independent

Difference of proportions ()

prop.test(...)

Linear Regression

Numerical variables

Slope (), Intercept ()

Least squares estimation

lm(...)

Critical Thinking and Communication

  • Interpret statistical results in context, avoiding unnecessary jargon.

  • Critique data-based claims and evaluate the validity of data-based decisions.

Additional info: This unit aligns with topics from Ch. 9 (Estimation), Ch. 10 (Hypothesis Testing), Ch. 11 (Inference on Two Population Parameters), and Ch. 14 (Regression Analysis) of a typical college statistics course.

Pearson Logo

Study Prep