Skip to main content
Back

Least-Squares Regression and The Normal Probability Distribution

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Least-Squares Regression

Finding the Least-Squares Regression Line

The least-squares regression line is a statistical method used to model the relationship between two quantitative variables. It minimizes the sum of the squared differences between observed values and the values predicted by the line.

  • Definition: The least-squares regression line is the line that best fits the data points on a scatterplot, minimizing the sum of the squared vertical distances (residuals) from each point to the line.

  • Equation: The general form is , where is the y-intercept and is the slope.

  • Example: For the data in Table 2:

    x

    y

    1

    (value not shown)

    3

    (value not shown)

    3

    (value not shown)

    6

    (value not shown)

    7

    (value not shown)

    The least-squares regression line is:

  • Interpretation: The slope () indicates the average change in for each unit increase in . The intercept () is the predicted value of $y$ when .

Additional info: The actual values are not shown in the table, but the regression equation is provided. The process involves calculating means, variances, and covariances of and $y$.

The Normal Probability Distribution

Introduction to the Normal Distribution

The normal distribution is a fundamental probability distribution in statistics, often used to model real-world phenomena that cluster around a mean.

  • Definition: A normal distribution is a continuous probability distribution that is symmetric about its mean, showing that data near the mean are more frequent in occurrence than data far from the mean.

  • Normal Curve: The graphical representation of a normal distribution is called the normal curve, which is bell-shaped and symmetric.

  • Continuous Random Variable: A variable that can take any value within a given range and is said to be normally distributed if its histogram approximates the shape of a normal curve.

Properties of the Normal Curve

The normal curve has several important properties that make it useful for statistical analysis.

  • Symmetry: The curve is symmetric about its mean .

  • Mean, Median, Mode: All are equal and located at the center of the distribution.

  • Inflection Points: The points at and are where the curvature changes direction.

  • Total Area: The area under the curve is 1, representing the total probability.

  • Asymptotic: The curve approaches, but never touches, the horizontal axis as moves away from the mean in either direction.

Effect of Mean and Standard Deviation

Changing the mean () shifts the curve left or right, while changing the standard deviation () affects the spread and height of the curve.

  • Increasing : Shifts the curve horizontally without changing its shape.

  • Increasing : Makes the curve flatter and more spread out.

The Empirical Rule (68-95-99.7 Rule)

The empirical rule describes the proportion of data within certain standard deviations of the mean in a normal distribution.

  • Approximately 68% of the data falls within .

  • Approximately 95% falls within .

  • Approximately 99.7% falls within .

Role of Area in the Normal Density Function

The area under the normal curve for a given interval represents the probability or proportion of observations within that interval.

  • Probability Density Function (pdf): The normal pdf is given by: where is the mean and is the standard deviation.

  • Interpretation: The area under the curve between two values and gives , the probability that falls in that interval.

  • Example: If the area to the right of is 0.2903, then 29.03% of the population has values greater than 200, and the probability that a randomly selected individual exceeds 200 is 0.2903.

Standardizing and the Standard Normal Distribution

Standard Normal Distribution

The standard normal distribution is a special case of the normal distribution with mean 0 and standard deviation 1.

  • Z-score: The standardized value (z-score) is calculated as: It represents the number of standard deviations is from the mean.

  • Use: Z-scores allow comparison across different normal distributions and facilitate finding probabilities using standard normal tables.

  • Example: An IQ score of 120 with and has a z-score: This means the score is 1.33 standard deviations above the mean.

Finding Areas Under the Normal Curve

Areas under the normal curve correspond to probabilities and can be found using z-tables or technology.

  • To the Left of z: Use the standard normal table to find .

  • To the Right of z: Use the complement rule: .

  • Between Two Values: .

  • Example: For , , so .

Applications: Heights Example

Suppose the heights of three-year-old females are normally distributed with mean 38.72 inches and standard deviation 3.17 inches.

  • Proportion less than 35 inches:

    • Calculate z-score:

    • Look up in the table:

    • Interpretation: 12.1% of three-year-old females are less than 35 inches tall.

  • Proportion between 35 and 40 inches:

    • Find z-scores: for 35 inches, for 40 inches

    • Find areas: ,

    • Area between:

    • Interpretation: About 53.4% of three-year-old females are between 35 and 40 inches tall.

Finding Values Corresponding to Given Areas (Percentiles)

To find the value of a normal random variable corresponding to a given percentile:

  1. Find the z-score for the desired area (percentile) using the standard normal table.

  2. Convert the z-score to the original scale:

  3. Example: 20th percentile for heights (, ):

    • z-score for 0.20 area:

    • inches

Percentiles for SAT Scores Example

  • Mean SAT math score: ,

  • 5th percentile: ,

  • 95th percentile: ,

  • Interpretation: Scores of 325 and 707 separate the bottom and top 5% from the middle 90% of test takers.

Probability of a Single Value

For any continuous random variable, the probability of observing any single exact value is zero: .

  • Probabilities are only meaningful for intervals, not for exact values.

Pearson Logo

Study Prep