BackBusiness Statistics: Random Variables, Probability Models, Normal Distributions, and Inference
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Chapter 6: Random Variables & Probability Models
Probability Basics
Probability quantifies uncertainty and describes the likelihood of events in the long run. Understanding probability is foundational for statistical inference and modeling random phenomena.
Probability is defined as the long-run relative frequency of an event occurring.
Each probability value must be between 0 and 1, inclusive.
The sum of all probabilities for all possible outcomes in a sample space must equal 1.
To determine if a probability assignment is valid, check that all probabilities are between 0 and 1 and that their sum is 1.
Key Concept: Law of Large Numbers states that as the number of trials increases, the observed proportion of an event approaches its theoretical probability.
Discrete Random Variables
A discrete random variable takes on a countable number of possible values, each with an associated probability.
Probabilities are assigned to each possible outcome.
The probability distribution lists all possible values and their probabilities.
Expected Value & Standard Deviation
The expected value (mean) and standard deviation summarize the center and spread of a random variable's distribution.
Expected Value (Mean): The long-run average value of repetitions of the experiment it represents.
Standard Deviation: Measures the variability or spread of the distribution.
Excel Calculation: Mean can be calculated as SUMPRODUCT(outcomes, probabilities).
Formulas:
Expected Value:
Variance:
Standard Deviation:
Operations with Random Variables
Random variables can be combined, and their expected values and variances behave predictably under certain conditions.
Sum of Expected Values:
Product of Expected Values (if independent):
Variance of Sum (if independent):
Transformations
Shifting and scaling a random variable affects its mean and standard deviation:
Adding a constant shifts the mean but does not affect the standard deviation.
Multiplying by a constant scales both the mean and the standard deviation by the absolute value of the constant.
Chapter 7: Normal & Continuous Distributions
Recognizing Normal Distributions
The normal distribution is a continuous, symmetric, bell-shaped distribution that is fundamental in statistics.
Histograms of normal data are unimodal, symmetric, and bell-shaped.
Normal probability plots (Q-Q plots) should be roughly a straight line if data are approximately normal.
Using the Normal Model
Before applying the normal model, verify that the data are approximately normal. Standardization allows comparison across different normal distributions.
Standardize values using z-scores:
Use the standard normal distribution (mean 0, standard deviation 1) for probability calculations.
68–95–99.7 Rule
This empirical rule describes the spread of data in a normal distribution:
Approximately 68% of values fall within 1 standard deviation of the mean.
Approximately 95% within 2 standard deviations.
Approximately 99.7% within 3 standard deviations.
Values beyond 3 standard deviations are considered extreme or outliers.
Probabilities and Calculations
Use normal tables or technology (e.g., statistical software) to find probabilities for normal distributions.
Pay attention to the direction of inequalities (e.g., <, >).
Combining Normal Random Variables
The sum or difference of independent normal random variables is also normally distributed.
Means add or subtract; variances always add (for independent variables):
Other Continuous Models
Not all continuous data are best modeled by the normal distribution; recognize when other models (e.g., uniform, exponential) are more appropriate.
Chapter 10: Sampling Distributions & Confidence Intervals (Proportions)
Sampling Distributions
The sampling distribution of a statistic describes the distribution of that statistic over all possible samples from the population.
Standard deviation of the sampling distribution (standard error) quantifies how much the statistic varies from sample to sample.
Confidence Intervals for Proportions
A confidence interval estimates a population proportion with an associated margin of error.
General form:
Margin of Error (ME):
Interpretation: "We are __% confident the true proportion is in this interval."
Conditions for Validity
Randomization/independence: Data must be from a random sample or randomized experiment.
10% condition: Sample size should be less than 10% of the population if sampling without replacement.
Success/failure condition: At least 10 expected successes and 10 expected failures (, ).
Watch for violations such as biased sampling, lack of independence, or survey bias.
Chapter 11: Confidence Intervals for Means
Sampling Distribution of the Mean
The sampling distribution of the sample mean describes how the mean varies from sample to sample. The Central Limit Theorem states that, for large samples, the sampling distribution of the mean is approximately normal, regardless of the population's distribution.
Standard Error
When the population standard deviation is unknown, estimate the standard error using the sample standard deviation:
t-Distribution
Use the t-distribution instead of the normal (z) distribution when estimating means and the population standard deviation is unknown.
The t-distribution is wider than the normal distribution, especially for small sample sizes.
Confidence Interval for the Mean
General form:
Where
Hypothesis Testing for the Mean
Null hypothesis:
Test statistic:
Chapter 12: Hypothesis Testing (General)
Hypotheses
Null hypothesis (H₀): States that a population parameter equals a specific value.
Alternative hypothesis (Hₐ): Represents what you are testing for (e.g., not equal, greater than, or less than the null value).
Steps in Hypothesis Testing
State the null and alternative hypotheses.
Check assumptions and conditions for the test.
Compute the test statistic (using the standard error).
Use the t-distribution when the population standard deviation is unknown.
Draw a conclusion based on the p-value or critical value approach.
Chapter 13: Errors in Hypothesis Testing
Types of Errors
Type I Error: Rejecting a true null hypothesis (false positive).
Type II Error: Failing to reject a false null hypothesis (false negative).
Example: In a medical test, a Type I error means diagnosing a healthy person as sick, while a Type II error means failing to diagnose a sick person.