Skip to main content
Back

Statistics Exam Study Guide: Descriptive Statistics, Probability, and Regression

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Descriptive Statistics

Contingency Tables and Conditional Distributions

Contingency tables summarize the relationship between two categorical variables by displaying the frequency counts for each combination of categories.

  • Conditional Distribution: The distribution of one variable for a specific value of another variable. For example, the percentage of respondents from the West who prefer Classical music.

  • Row and Column Percentages: To find the percentage, divide the count in the cell by the total for the relevant row or column, then multiply by 100.

Genre

Northeast

West

South

Total

Classical

45

31

27

103

Country

35

70

93

198

Pop Rock

80

67

52

199

Total

160

168

172

500

  • Example: To find the percentage of Classical listeners from the West:

Five-Number Summary and Boxplots

The five-number summary provides a quick overview of the distribution of a dataset:

  • Minimum (Min): The smallest value

  • First Quartile (Q1): 25th percentile

  • Median: 50th percentile

  • Third Quartile (Q3): 75th percentile

  • Maximum (Max): The largest value

Value

Min

32

Q1

80

Median

95

Q3

110

Max

153

  • Interquartile Range (IQR):

  • Outliers: Values below or above are considered outliers.

  • Example:

Histograms and Distribution Shape

Histograms display the frequency of data within specified intervals (bins).

  • Shape: Can be symmetric, skewed left (tail on left), or skewed right (tail on right).

  • Center: Median or mean can be used to describe the center.

  • Spread: Range, IQR, or standard deviation describe variability.

  • Outliers: Unusually high or low values that do not fit the general pattern.

Boxplots

Boxplots visually summarize the five-number summary and help compare distributions.

  • Comparisons: Boxplots can compare medians, IQRs, and detect outliers between groups (e.g., children's vs. adult cereals).

Measures of Center and Spread

Mean and Median

  • Mean (): The arithmetic average. Sensitive to outliers and skewed data.

  • Median: The middle value when data are ordered. Resistant to outliers and skewed data.

  • Example: If the median sugar content is 18.4%, the mean could be higher or lower depending on skewness.

Standard Deviation and IQR

  • Standard Deviation (): Measures the average distance of data points from the mean.

  • Interquartile Range (IQR): The range of the middle 50% of the data.

Normal Distribution and Z-Scores

Standard Normal Distribution

The standard normal distribution is a normal distribution with mean 0 and standard deviation 1.

  • Z-score: , where is the value, is the mean, and is the standard deviation.

  • Interpretation: The z-score tells how many standard deviations a value is from the mean.

  • Normal Table: Used to find probabilities and percentiles for normal distributions.

Applications

  • Comparing Unusual Values: The value with the largest absolute z-score is the most unusual.

  • Percentiles: The area under the normal curve to the left of a z-score gives the percentile.

  • Example: If a bag of carrots has a mean weight of 16.3 oz and standard deviation 0.3 oz, the percentage above 16 oz can be found using the z-score and normal table.

Scatterplots and Correlation

Scatterplots

Scatterplots show the relationship between two quantitative variables.

  • Direction: Positive (as one increases, so does the other) or negative (as one increases, the other decreases).

  • Form: Linear or nonlinear.

  • Strength: How closely the points follow a clear form.

Correlation Coefficient ()

  • Definition: Measures the strength and direction of a linear relationship between two variables.

  • Range:

  • Interpretation: indicates a positive association; indicates a negative association; indicates no linear association.

  • Calculation:

  • Example Table: Organize calculations using columns for , , , , , , , .

Simple Linear Regression

Regression Line

Regression analysis estimates the relationship between a response variable () and a predictor variable ().

  • Equation:

  • Slope (): , where is the correlation, is the standard deviation of , and is the standard deviation of .

  • Interpretation: The slope represents the expected change in for a one-unit increase in .

  • Intercept ():

  • Coefficient of Determination (): ; the proportion of variance in explained by .

  • Residual: The difference between the observed value and the predicted value:

Friends (x)

Time (min) (y)

Mean

21.61

105.13

Standard Deviation

5.72

39.91

Correlation

0.385

  • Example: Slope:

Probability and the Normal Distribution

Calculating Probabilities

  • Standardization: Convert values to z-scores to use the standard normal table.

  • Finding Probabilities: Use the normal table to find the area to the left of a z-score.

  • Percentiles: The value below which a given percentage of observations fall.

Summary Table: Key Formulas

Concept

Formula

Mean

Standard Deviation

Z-score

Regression Slope

Regression Intercept

Correlation

Coefficient of Determination

Using the Standard Normal Table

The standard normal table (z-table) provides the area under the standard normal curve to the left of a given z-score. This area represents the probability that a standard normal variable is less than or equal to the z-score.

  • How to Use: Find the row for the first two digits of the z-score and the column for the second decimal place. The intersection gives the cumulative probability.

  • Example: For , find row 1.2 and column 0.03.

Additional info:

  • Some questions require interpretation of graphs and tables, calculation of summary statistics, and application of the normal distribution to real-world problems.

  • Understanding the context of each problem is essential for correct interpretation and calculation.

Pearson Logo

Study Prep