Skip to main content
Back

MA 113: Midterm 1 & 2 Study Guide – Regression, Probability, Discrete & Continuous Distributions, Sampling, and Confidence Intervals

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Regression

Introduction to Regression and the Coefficient of Determination

Regression analysis is a statistical method used to examine the relationship between two or more variables. The coefficient of determination, denoted as R2, quantifies the proportion of variance in the dependent variable that is predictable from the independent variable(s).

  • Coefficient of Determination (R2): Measures the fraction of variance explained by the regression model. For linear least squares regression, where is the correlation coefficient.

  • Residual: The difference between an observed value and its predicted value: .

  • Residual Plot: A graphical tool to assess the fit of a regression model and check for non-linearity or non-constant error variance.

  • Outlier: An observation with a large residual.

  • Influential Observation: A data point that significantly affects the regression line.

Formulas:

  • Residual (Unexplained deviation):

  • Explained deviation:

  • Total deviation:

  • Unexplained variation:

  • Explained variation:

  • Total variation:

Example: If a regression model explains 80% of the variance in exam scores based on study hours, then .

Probability

Rules of Probability

Probability quantifies the likelihood of events occurring in a random experiment. There are several approaches and rules for calculating probabilities.

  • Sample Space (S): The set of all possible outcomes.

  • Event: A subset of the sample space.

  • Empirical Probability: Based on observed data; .

  • Classical Probability: For equally likely outcomes; .

  • Probability Model: Assigns probabilities to all outcomes in the sample space such that and .

  • Unusual Event: Probability less than 0.05.

Formulas:

  • Empirical:

  • Classical:

Example: If 3 out of 10 students prefer online classes, the empirical probability is .

Addition Rules and Venn Diagrams

The addition rules help calculate the probability of the union of events. Venn diagrams visually represent these relationships.

  • Disjoint (Mutually Exclusive) Events: Events that cannot occur together.

  • General Addition Rule: For any events E and F, .

  • Complement Rule:

Formulas:

  • If E and F are disjoint:

  • General:

  • Complement:

Example: If , , and , then .

Independence

Two events are independent if the occurrence of one does not affect the probability of the other.

  • Independent Events:

  • Dependent Events: Events that are not independent.

Example: Tossing two coins: the result of one does not affect the other.

Discrete Probability Distributions

Measures of Central Tendency

A discrete random variable takes on countable values, each with an associated probability. The mean (expected value) and standard deviation summarize its distribution.

  • Random Variable: A variable whose value is a numerical outcome of a random phenomenon.

  • Discrete Random Variable: Takes on countable values.

  • Probability Mass Function (pmf):

  • Expected Value (Mean):

  • Standard Deviation:

  • Law of Large Numbers: As sample size increases, the sample mean and standard deviation approach the population values.

Example: Rolling a fair die: can be 1-6, each with .

Binomial Distribution

The binomial distribution models the number of successes in a fixed number of independent trials, each with the same probability of success.

  • Binomial Experiment: Fixed number of independent trials, each with two outcomes (success/failure).

  • Binomial Random Variable: Counts the number of successes in trials.

  • Combination Notation:

  • Binomial Probability:

  • Mean:

  • Standard Deviation:

Example: Flipping a coin 5 times, probability of 3 heads:

Continuous Random Variables

Continuous Distributions and the Normal Curve

Continuous random variables can take any value in an interval. Their probabilities are described by a probability density function (pdf), and the area under the curve represents probability.

  • Probability Density Function (pdf): A function such that

  • Uniform Distribution: All intervals of the same length are equally probable.

  • Normal Distribution: Bell-shaped, symmetric about the mean , with standard deviation .

  • Inflection Point: Points where the curve changes concavity, located at .

Properties of a pdf:

  • for all

Example: For a uniform distribution on [0,1], for .

Calculating Probabilities with the Normal Distribution

The standard normal distribution has mean 0 and standard deviation 1. Z-scores standardize normal variables for probability calculations using tables.

  • Z-score:

  • Standard Normal Distribution:

  • Percentiles: The value below which a given percentage of observations fall.

Formulas:

Example: If , the Z-score for is .

Sampling Distributions

Distribution of the Sample Mean

The sampling distribution of the sample mean describes the distribution of means from all possible samples of a given size from a population. The Central Limit Theorem states that, for large samples, this distribution is approximately normal.

  • Sample Mean (): The average of a sample.

  • Population Mean (): The average of the population.

  • Standard Error (): The standard deviation of the sample mean.

  • Central Limit Theorem: For , the sampling distribution of is approximately normal, regardless of the population's distribution.

Formulas:

Example: If , , , then .

Distribution of the Sample Proportion

The sample proportion estimates the population proportion . For large samples, its distribution is approximately normal.

  • Sample Proportion (): , where is the number of successes in trials.

  • Mean:

  • Standard Deviation:

  • Z-score:

Example: If , , then .

Confidence Intervals

Estimating Population Proportions

A confidence interval provides a range of plausible values for a population parameter, based on sample data and a specified confidence level.

  • Point Estimate: The sample statistic used to estimate a population parameter (e.g., for ).

  • Confidence Interval: , where is the margin of error.

  • Margin of Error:

  • Critical Value (): The Z-score corresponding to the desired confidence level.

  • Sample Size Calculation: (if is known), or (if $p$ is unknown).

Formulas:

  • Margin of Error:

  • Confidence Interval: to

  • Sample Size: or

Example: For , , , ; CI is (0.402, 0.598).

Estimating Population Means

When estimating a population mean, the Student's t-distribution is used if the population standard deviation is unknown and the sample size is small.

  • Point Estimate: The sample mean .

  • t-Distribution: Used instead of the normal distribution when is unknown.

  • Degrees of Freedom:

  • Margin of Error:

  • Confidence Interval:

Formulas:

  • Margin of Error:

  • Confidence Interval: to

Example: For , , , , ; CI is (17.869, 22.131).

Additional info: The t-distribution is more robust for small samples, especially when the population is normal or nearly normal. For large samples (), the t-distribution approximates the normal distribution.

Pearson Logo

Study Prep