Skip to main content
Back

Statistics Study Guide: Correlation, Regression, Probability, and Data Interpretation

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Correlation and Scatterplots

Understanding Scatterplots

Scatterplots are graphical representations that show the relationship between two quantitative variables. Each point on the plot represents an observation with values for both variables.

  • Interpretation: The direction, form, and strength of the relationship can be visually assessed.

  • Positive Association: As one variable increases, the other tends to increase.

  • Negative Association: As one variable increases, the other tends to decrease.

  • No Association: No discernible pattern between the variables.

Example: A scatterplot showing car value (in thousands of dollars) versus car age. The value of cars tends to decrease as age increases, indicating a negative association.

Correlation Coefficient

The correlation coefficient (denoted as r) measures the strength and direction of a linear relationship between two variables.

  • Range:

  • Interpretation:

    • : Perfect positive linear relationship

    • : Perfect negative linear relationship

    • : No linear relationship

  • Strength: The closer |r| is to 1, the stronger the linear relationship.

  • Correlation does not imply causation.

Example: A correlation coefficient of -0.98 between hours of exercise per week and resting heart rate indicates a strong negative linear relationship.

Regression Analysis

Simple Linear Regression

Simple linear regression models the relationship between a dependent variable (response) and an independent variable (predictor) using a straight line.

  • Regression Equation:

  • y: Dependent variable (response)

  • x: Independent variable (predictor)

  • a: Intercept (value of y when x = 0)

  • b: Slope (change in y for a one-unit increase in x)

Example: Resting Heart Rate = 79.54 - 4.08 × (Hours of Exercise per Week)

  • The slope (-4.08) indicates that for each additional hour of exercise per week, resting heart rate decreases by 4.08 beats per minute, on average.

Interpreting Regression Output

  • Coefficient of Determination (): Proportion of variance in the dependent variable explained by the independent variable.

  • Standard Error: Measures the average distance that the observed values fall from the regression line.

Example Table:

Statistic

Value

Correlation Coefficient (r)

0.88

Coefficient of Determination ()

0.765

Standard Error

5.49

Additional info: of 0.765 means 76.5% of the variation in BMI is explained by the number of hours playing games.

Probability Concepts

Types of Probability

  • Theoretical Probability: Based on reasoning or a mathematical model (e.g., probability of rolling a 3 on a fair die is 1/6).

  • Empirical Probability: Based on observed data from experiments or trials.

Sample Space and Events

  • Sample Space (S): The set of all possible outcomes of an experiment.

  • Event: A subset of the sample space.

  • Mutually Exclusive Events: Events that cannot occur at the same time.

  • Independent Events: The occurrence of one event does not affect the probability of the other.

Example: Flipping a coin twice: S = {HH, HT, TH, TT}

Calculating Probability

  • Probability of an event A:

  • Complementary Events:

  • Union of Events (A or B):

  • Intersection of Independent Events:

Contingency Tables and Probability

Interpreting Contingency Tables

Contingency tables display the frequency distribution of variables and are useful for calculating probabilities involving two categorical variables.

50-55

56-60

61-65

Over 65

TOTAL

Attorney

80

88

74

37

279

College Professor

70

62

81

49

262

Receptionist

13

19

30

30

92

Store Clerk

25

14

70

31

140

TOTAL

188

183

255

147

773

Example: Probability that a randomly chosen adult was an attorney:

Law of Large Numbers (LLN)

Definition and Implications

The Law of Large Numbers states that as the number of trials in a probability experiment increases, the empirical probability approaches the theoretical probability.

  • Implication: With more repetitions, observed frequencies stabilize around expected probabilities.

  • Application: Used to justify the reliability of empirical probabilities in large samples.

Key Terms and Concepts

  • Regression Line: The best-fitting straight line through a scatterplot of data points.

  • Intercept: The value of the response variable when the predictor is zero.

  • Slope: The change in the response variable for a one-unit increase in the predictor.

  • Empirical Probability: Probability based on observed data.

  • Theoretical Probability: Probability based on mathematical reasoning.

  • Mutually Exclusive Events: Events that cannot happen at the same time.

  • Independent Events: The occurrence of one event does not affect the probability of the other.

Summary Table: Types of Probability

Type

Definition

Example

Theoretical

Based on mathematical reasoning

Probability of rolling a 3 on a fair die: 1/6

Empirical

Based on observed data

Probability of heads in 100 coin tosses: number of heads/100

Additional info: This guide covers key concepts in introductory statistics, including data interpretation, correlation, regression, probability, and the law of large numbers, as reflected in the provided questions.

Pearson Logo

Study Prep