BackStatistics Exam Study Guide: Descriptive Statistics, Probability, and Regression
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Descriptive Statistics
Contingency Tables and Conditional Distributions
Contingency tables summarize the relationship between two categorical variables by displaying the frequency counts for each combination of categories.
Conditional Distribution: The distribution of one variable for a specific value of another variable. For example, the percentage of respondents from the West who prefer Classical music.
Row and Column Percentages: To find the percentage, divide the count in the cell by the total for the relevant row or column, then multiply by 100.
Genre | Northeast | West | South | Total |
|---|---|---|---|---|
Classical | 45 | 31 | 27 | 103 |
Country | 35 | 70 | 93 | 198 |
Pop Rock | 80 | 67 | 52 | 199 |
Total | 160 | 168 | 172 | 500 |
Example: To find the percentage of Classical listeners from the West:
Five-Number Summary and Boxplots
The five-number summary provides a quick overview of the distribution of a dataset:
Minimum (Min): The smallest value
First Quartile (Q1): 25th percentile
Median: 50th percentile
Third Quartile (Q3): 75th percentile
Maximum (Max): The largest value
Value | |
|---|---|
Min | 32 |
Q1 | 80 |
Median | 95 |
Q3 | 110 |
Max | 153 |
Interquartile Range (IQR):
Outliers: Values below or above are considered outliers.
Example:
Histograms and Distribution Shape
Histograms display the frequency of data within specified intervals (bins).
Shape: Can be symmetric, skewed left (tail on left), or skewed right (tail on right).
Center: Median or mean can be used to describe the center.
Spread: Range, IQR, or standard deviation describe variability.
Outliers: Unusually high or low values that do not fit the general pattern.
Boxplots
Boxplots visually summarize the five-number summary and help compare distributions.
Comparisons: Boxplots can compare medians, IQRs, and detect outliers between groups (e.g., children's vs. adult cereals).
Measures of Center and Spread
Mean and Median
Mean (): The arithmetic average. Sensitive to outliers and skewed data.
Median: The middle value when data are ordered. Resistant to outliers and skewed data.
Example: If the median sugar content is 18.4%, the mean could be higher or lower depending on skewness.
Standard Deviation and IQR
Standard Deviation (): Measures the average distance of data points from the mean.
Interquartile Range (IQR): The range of the middle 50% of the data.
Normal Distribution and Z-Scores
Standard Normal Distribution
The standard normal distribution is a normal distribution with mean 0 and standard deviation 1.
Z-score: , where is the value, is the mean, and is the standard deviation.
Interpretation: The z-score tells how many standard deviations a value is from the mean.
Normal Table: Used to find probabilities and percentiles for normal distributions.
Applications
Comparing Unusual Values: The value with the largest absolute z-score is the most unusual.
Percentiles: The area under the normal curve to the left of a z-score gives the percentile.
Example: If a bag of carrots has a mean weight of 16.3 oz and standard deviation 0.3 oz, the percentage above 16 oz can be found using the z-score and normal table.
Scatterplots and Correlation
Scatterplots
Scatterplots show the relationship between two quantitative variables.
Direction: Positive (as one increases, so does the other) or negative (as one increases, the other decreases).
Form: Linear or nonlinear.
Strength: How closely the points follow a clear form.
Correlation Coefficient ()
Definition: Measures the strength and direction of a linear relationship between two variables.
Range:
Interpretation: indicates a positive association; indicates a negative association; indicates no linear association.
Calculation:
Example Table: Organize calculations using columns for , , , , , , , .
Simple Linear Regression
Regression Line
Regression analysis estimates the relationship between a response variable () and a predictor variable ().
Equation:
Slope (): , where is the correlation, is the standard deviation of , and is the standard deviation of .
Interpretation: The slope represents the expected change in for a one-unit increase in .
Intercept ():
Coefficient of Determination (): ; the proportion of variance in explained by .
Residual: The difference between the observed value and the predicted value:
Friends (x) | Time (min) (y) | |
|---|---|---|
Mean | 21.61 | 105.13 |
Standard Deviation | 5.72 | 39.91 |
Correlation | 0.385 | |
Example: Slope:
Probability and the Normal Distribution
Calculating Probabilities
Standardization: Convert values to z-scores to use the standard normal table.
Finding Probabilities: Use the normal table to find the area to the left of a z-score.
Percentiles: The value below which a given percentage of observations fall.
Summary Table: Key Formulas
Concept | Formula |
|---|---|
Mean | |
Standard Deviation | |
Z-score | |
Regression Slope | |
Regression Intercept | |
Correlation | |
Coefficient of Determination |
Using the Standard Normal Table
The standard normal table (z-table) provides the area under the standard normal curve to the left of a given z-score. This area represents the probability that a standard normal variable is less than or equal to the z-score.
How to Use: Find the row for the first two digits of the z-score and the column for the second decimal place. The intersection gives the cumulative probability.
Example: For , find row 1.2 and column 0.03.
Additional info:
Some questions require interpretation of graphs and tables, calculation of summary statistics, and application of the normal distribution to real-world problems.
Understanding the context of each problem is essential for correct interpretation and calculation.