Statistics Exam Study Guide: Descriptive Statistics, Probability, and Regression

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Descriptive Statistics

Contingency Tables and Conditional Distributions

Contingency tables summarize the relationship between two categorical variables by displaying the frequency counts for each combination of categories.

Conditional Distribution: The distribution of one variable for a specific value of another variable. For example, the percentage of respondents from the West who prefer Classical music.
Row and Column Percentages: To find the percentage, divide the count in the cell by the total for the relevant row or column, then multiply by 100.

Genre	Northeast	West	South	Total
Classical	45	31	27	103
Country	35	70	93	198
Pop Rock	80	67	52	199
Total	160	168	172	500

Example: To find the percentage of Classical listeners from the West:

Five-Number Summary and Boxplots

The five-number summary provides a quick overview of the distribution of a dataset:

Minimum (Min): The smallest value
First Quartile (Q1): 25th percentile
Median: 50th percentile
Third Quartile (Q3): 75th percentile
Maximum (Max): The largest value

	Value
Min	32
Q1	80
Median	95
Q3	110
Max	153

Interquartile Range (IQR):
Outliers: Values below or above are considered outliers.
Example:

Histograms and Distribution Shape

Histograms display the frequency of data within specified intervals (bins).

Shape: Can be symmetric, skewed left (tail on left), or skewed right (tail on right).
Center: Median or mean can be used to describe the center.
Spread: Range, IQR, or standard deviation describe variability.
Outliers: Unusually high or low values that do not fit the general pattern.

Boxplots

Boxplots visually summarize the five-number summary and help compare distributions.

Comparisons: Boxplots can compare medians, IQRs, and detect outliers between groups (e.g., children's vs. adult cereals).

Measures of Center and Spread

Mean and Median

Mean (): The arithmetic average. Sensitive to outliers and skewed data.
Median: The middle value when data are ordered. Resistant to outliers and skewed data.
Example: If the median sugar content is 18.4%, the mean could be higher or lower depending on skewness.

Standard Deviation and IQR

Standard Deviation (): Measures the average distance of data points from the mean.
Interquartile Range (IQR): The range of the middle 50% of the data.

Normal Distribution and Z-Scores

Standard Normal Distribution

The standard normal distribution is a normal distribution with mean 0 and standard deviation 1.

Z-score: , where is the value, is the mean, and is the standard deviation.
Interpretation: The z-score tells how many standard deviations a value is from the mean.
Normal Table: Used to find probabilities and percentiles for normal distributions.

Applications

Comparing Unusual Values: The value with the largest absolute z-score is the most unusual.
Percentiles: The area under the normal curve to the left of a z-score gives the percentile.
Example: If a bag of carrots has a mean weight of 16.3 oz and standard deviation 0.3 oz, the percentage above 16 oz can be found using the z-score and normal table.

Scatterplots and Correlation

Scatterplots

Scatterplots show the relationship between two quantitative variables.

Direction: Positive (as one increases, so does the other) or negative (as one increases, the other decreases).
Form: Linear or nonlinear.
Strength: How closely the points follow a clear form.

Correlation Coefficient ()

Definition: Measures the strength and direction of a linear relationship between two variables.
Range:
Interpretation: indicates a positive association; indicates a negative association; indicates no linear association.
Calculation:
Example Table: Organize calculations using columns for , , , , , , , .

Simple Linear Regression

Regression Line

Regression analysis estimates the relationship between a response variable () and a predictor variable ().

Equation:
Slope (): , where is the correlation, is the standard deviation of , and is the standard deviation of .
Interpretation: The slope represents the expected change in for a one-unit increase in .
Intercept ():
Coefficient of Determination (): ; the proportion of variance in explained by .
Residual: The difference between the observed value and the predicted value:

	Friends (x)	Time (min) (y)
Mean	21.61	105.13
Standard Deviation	5.72	39.91
Correlation	0.385

Example: Slope:

Probability and the Normal Distribution

Calculating Probabilities

Standardization: Convert values to z-scores to use the standard normal table.
Finding Probabilities: Use the normal table to find the area to the left of a z-score.
Percentiles: The value below which a given percentage of observations fall.

Summary Table: Key Formulas

Concept	Formula
Mean
Standard Deviation
Z-score
Regression Slope
Regression Intercept
Correlation
Coefficient of Determination

Using the Standard Normal Table

The standard normal table (z-table) provides the area under the standard normal curve to the left of a given z-score. This area represents the probability that a standard normal variable is less than or equal to the z-score.

How to Use: Find the row for the first two digits of the z-score and the column for the second decimal place. The intersection gives the cumulative probability.
Example: For , find row 1.2 and column 0.03.

Additional info:

Some questions require interpretation of graphs and tables, calculation of summary statistics, and application of the normal distribution to real-world problems.
Understanding the context of each problem is essential for correct interpretation and calculation.