Skip to main content
Back

Comprehensive Study Notes: Core Concepts in Introductory Statistics

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Descriptive Statistics and Correlation

Scatterplots and Correlation Coefficient

Scatterplots are graphical representations used to visualize the relationship between two quantitative variables. The correlation coefficient quantifies the strength and direction of a linear relationship between these variables.

  • Scatterplot: A graph with points plotted to show a possible relationship between two sets of data.

  • Correlation Coefficient (r): A numerical measure ranging from -1 to 1 that indicates the strength and direction of a linear relationship.

  • Interpretation:

    • r > 0: Positive linear relationship

    • r < 0: Negative linear relationship

    • r = 0: No linear relationship

  • Critical Values: Used to determine if the observed correlation is statistically significant. Compare the absolute value of r to the critical value for the sample size.

Example: If r = 0.85 for a sample size of 10, and the critical value is 0.632, since 0.85 > 0.632, the correlation is significant.

Regression Analysis

Least Squares Regression Line

The least squares regression line is the best-fitting straight line through a set of points in a scatterplot, minimizing the sum of the squared vertical distances from the points to the line.

  • Equation:

  • Interpretation: is the slope (change in y per unit change in x), is the y-intercept.

  • Prediction: The regression line can be used to predict the value of y for a given x.

Example: If , then for x = 10, .

Probability Concepts

Basic Probability and Sample Spaces

Probability quantifies the likelihood of an event occurring, ranging from 0 (impossible) to 1 (certain).

  • Sample Space (S): The set of all possible outcomes.

  • Probability of an Event (A):

  • Complement:

  • Mutually Exclusive Events: Events that cannot occur together.

  • Independent Events:

Example: For a fair die, .

Contingency Tables

Contingency tables display the frequency distribution of variables and are used to compute joint and marginal probabilities.

Age Group

More Likely

Less Likely

Total

15-24

254

246

500

25-44

279

221

500

45-64

276

224

500

65+

191

309

500

Example: The probability that a randomly selected person is 25-44 and more likely is .

Random Variables and Distributions

Discrete and Continuous Random Variables

A random variable assigns a numerical value to each outcome in a sample space.

  • Discrete Random Variable: Takes on countable values (e.g., number of heads in coin tosses).

  • Continuous Random Variable: Takes on any value in an interval (e.g., height, weight).

Binomial Distribution

The binomial distribution models the number of successes in a fixed number of independent trials, each with the same probability of success.

  • Probability Mass Function:

  • Mean:

  • Standard Deviation:

Example: For n = 10, p = 0.5, the probability of 6 successes is .

Normal Distribution and the Empirical Rule

The normal distribution is a continuous, symmetric, bell-shaped distribution characterized by its mean () and standard deviation ().

  • Standard Normal Distribution: Mean 0, standard deviation 1.

  • Empirical Rule:

    • About 68% of data within

    • About 95% within

    • About 99.7% within

  • Z-score:

Example: If , , then for is .

Statistical Inference

Critical Values and Hypothesis Testing

Critical values are used to determine the threshold for statistical significance in hypothesis testing.

  • Test Statistic: A value calculated from sample data to test a hypothesis.

  • Decision Rule: If the test statistic exceeds the critical value, reject the null hypothesis.

Sample Size n

Critical Value

3

0.997

4

0.950

5

0.878

6

0.811

7

0.754

8

0.707

9

0.666

10

0.632

11

0.602

12

0.576

13

0.553

14

0.532

15

0.514

Example: For n = 8, the critical value is 0.707.

Applications of Probability and Statistics

Real-World Scenarios

Statistical methods are applied to a variety of real-world problems, such as predicting home prices, analyzing birth weights, and evaluating probabilities in games of chance.

  • Regression in Real Estate: Predicting home prices based on square footage using regression analysis.

  • Normal Distribution in Health: Assessing birth weights of infants using the normal distribution.

  • Probability in Sampling: Calculating the likelihood of certain outcomes in random samples.

Example: If the mean birth weight is 3300g with g, the probability that a baby weighs more than 4320g can be found using the z-score and standard normal table.

Summary Table: Key Probability Distributions

Distribution

Type

Parameters

Mean

Variance

Binomial

Discrete

n, p

Normal

Continuous

Uniform

Continuous

a, b

Additional info:

  • Some context and explanations have been expanded for clarity and completeness.

  • Tables have been reconstructed and summarized for study purposes.

Pearson Logo

Study Prep