BackMAT124 Midterm Study Guide: Key Topics in Introductory Statistics
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Chapter 1: Data Collection
Sources of Bias in Sampling
Understanding bias is essential for collecting reliable data. Bias occurs when a sample does not accurately represent the population.
Sampling Bias: When some members of the population are less likely to be included in the sample than others.
Nonresponse Bias: When individuals selected for the sample do not respond, and their nonresponse is related to the variable of interest.
Response Bias: When respondents give inaccurate answers due to question wording, interviewer influence, or social desirability.
Selection Bias: When the method of selecting the sample causes it to differ from the population.
Example: Surveying only daytime shoppers at a mall may exclude working individuals, introducing sampling bias.
Chapter 2: Organizing and Summarizing Data
Reading and Understanding Histograms
Histograms are graphical representations of the distribution of numerical data.
Each bar represents the frequency of data within a specific interval (bin).
The height of the bar indicates the number of observations in that interval.
Identifying the Shape of a Distribution
Symmetric: Both sides of the histogram are approximately mirror images.
Skewed Right (Positive Skew): The right tail is longer; mean > median.
Skewed Left (Negative Skew): The left tail is longer; mean < median.
Uniform: All intervals have roughly the same frequency.
Misleading Graphs and How to Fix Them
Graphs can mislead by using inappropriate scales, omitting baselines, or distorting axes.
To fix: Use consistent scales, start axes at zero, and avoid 3D effects that obscure data.
Example: A bar graph with a truncated y-axis exaggerates differences between groups.
Chapter 3: Numerically Summarizing Data
Comparing and Contrasting Normal Curves
Normal curves are bell-shaped and symmetric. They are defined by their mean (center) and standard deviation (spread).
Larger Mean: Shifts the curve horizontally.
Larger Standard Deviation: Makes the curve wider and flatter.
Empirical Rule (68-95-99.7 Rule)
The empirical rule describes the spread of data in a normal distribution:
About 68% of data falls within 1 standard deviation of the mean.
About 95% within 2 standard deviations.
About 99.7% within 3 standard deviations.
Formula:
Five Number Summary
The five number summary consists of:
Minimum
First Quartile (Q1)
Median (Q2)
Third Quartile (Q3)
Maximum
To determine by hand, order the data and find the quartiles and median.
Calculating Outliers Using the 1.5 IQR Rule
Outliers are values that fall outside the typical range of the data.
Calculate the interquartile range (IQR):
Lower bound:
Upper bound:
Any value outside these bounds is considered an outlier.
Interpreting Percentiles and Quartiles
Percentile: The value below which a given percentage of observations falls.
Quartiles: Q1 (25th percentile), Median (50th percentile), Q3 (75th percentile).
Comparing and Contrasting Boxplots
Boxplots visually display the five number summary.
They help compare distributions, spot outliers, and assess symmetry or skewness.
Location of Median, Q1, and Q3
Median divides the data into two equal halves.
Q1 is the median of the lower half; Q3 is the median of the upper half.
Chapter 4: Describing the Relation Between Two Variables
Correlation Coefficient (r)
The correlation coefficient measures the strength and direction of a linear relationship between two variables.
Values range from -1 (perfect negative) to +1 (perfect positive).
r ≈ 0 indicates no linear relationship.
Estimating r: From a scatter plot, assess the direction and tightness of the points around a line.
Coefficient of Determination (R2)
R2 indicates the proportion of variance in the dependent variable explained by the independent variable.
Ranges from 0 to 1.
Higher values indicate a stronger linear relationship.
Interpreting Slope and Y-Intercept
Slope (b): The change in the response variable for a one-unit increase in the explanatory variable.
Y-intercept (a): The predicted value when the explanatory variable is zero.
Example: In , the slope is 2, and the y-intercept is 5.
Calculating Predicted and Residual Values
Predicted Value: Substitute x into the regression equation.
Residual:
Extrapolation
Predicting values outside the range of observed data.
Can be unreliable as the relationship may not hold beyond the data range.
Contingency Tables and Probability Calculations
Contingency tables display the frequency distribution of variables.
Category A | Category B | Total | |
|---|---|---|---|
Group 1 | n11 | n12 | n1. |
Group 2 | n21 | n22 | n2. |
Total | n.1 | n.2 | n |
Marginal Probability: Probability of a single event, found in the margins (totals).
Conditional Probability: Probability of one event given another has occurred.
Chapter 5: Probability
General Addition Rule
Used to find the probability that at least one of two events occurs.
General Multiplication Rule
Used to find the probability that both events occur.
Conditional Probability
The probability of event A given that event B has occurred.
Sample Space
The set of all possible outcomes in a probability experiment.
Example: Flipping two coins: {HH, HT, TH, TT}
Calculating Probabilities from a Contingency Table
Use the counts in the table to find probabilities of events, intersections, and unions.
Marginal probabilities use row or column totals; conditional probabilities use appropriate cell and marginal totals.