Statistics Final Exam Study Notes: Probability, Random Variables, and Regression

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Probability Rules and Concepts

Basic Probability Rules

Probability is the mathematical study of randomness and uncertainty. The following rules are fundamental for calculating probabilities in various scenarios:

Total Probability Rule: The probability of the sample space S is always 1.
Complement Rule: The probability of an event A is 1 minus the probability of its complement.
Addition Rule for Disjoint Events: If A and B are mutually exclusive,
General Addition Rule: For any two events,
Conditional Probability: The probability of B given A is
General Multiplication Rule:
Multiplication Rule for Independent Events: If A and B are independent,

Random Variables and Their Properties

Expected Value and Variance

A random variable is a numerical outcome of a random process. The expected value (mean), variance, and standard deviation are key characteristics:

Expected Value (Mean):
Variance:
Standard Deviation:

Combining Random Variables

Adding/Subtracting Constants: Adding a constant to a random variable shifts the mean but does not affect the variance or standard deviation.
Multiplying by Constants: Multiplying a random variable by a constant a multiplies the mean by a and the standard deviation by |a|:
If X and Y are independent:

Binomial Model

Binomial Probability

The binomial model describes the probability of k successes in n independent trials, each with probability p of success:

Binomial Coefficient:
Probability of k successes: , where
Mean (Expected Number of Successes):
Standard Deviation:

n = number of trials, k = number of successes, p = probability of success, q = probability of failure.

Correlation and Regression

Correlation Coefficient

The correlation coefficient (r) measures the strength and direction of a linear relationship between two quantitative variables:

Formula:
Where is the covariance, and are the standard deviations of X and Y, respectively.

Formula for correlation coefficient r

Regression Analysis

Regression is used to model the relationship between a dependent variable Y and an independent variable X.

Residual:
Line of Best Fit:
Slope:
Intercept:
Regression to the Mean: The predicted value is closer to the mean of Y than X is to the mean of X, unless r = ±1.
Residual Standard Deviation: Measures the typical size of the residuals (errors) from the regression line.

The Normal Model and Z-Scores

Standard Deviation as a Ruler and the Normal Model

The normal model is a symmetric, bell-shaped distribution characterized by its mean (μ) and standard deviation (σ). Z-scores standardize values for comparison:

Z-Score:
Z-scores indicate how many standard deviations a value is from the mean.
Normal tables (Z-tables) are used to find probabilities and percentiles for standard normal distributions.

Standard normal table (negative z-values) Standard normal table (positive z-values)

Using the Standard Normal Table

The standard normal table provides the area (probability) to the left of a given z-score in the standard normal distribution. To use the table:

Find the row for the first two digits of the z-score.
Find the column for the second decimal place.
The intersection gives the cumulative probability.
For negative z-scores, use the left table; for positive z-scores, use the right table.

Summary Table: Key Formulas and Concepts

Concept	Formula	Description
Expected Value		Mean of a random variable
Variance		Spread of a random variable
Standard Deviation		Typical deviation from the mean
Binomial Probability		Probability of k successes in n trials
Correlation Coefficient		Strength of linear relationship
Regression Line		Best fit line for prediction
Z-Score		Standardized value