MAT124 Midterm Study Guide: Key Topics in Introductory Statistics

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Chapter 1: Data Collection

Sources of Bias in Sampling

Understanding bias is essential for collecting reliable data. Bias occurs when a sample does not accurately represent the population.

Sampling Bias: When some members of the population are less likely to be included in the sample than others.
Nonresponse Bias: When individuals selected for the sample do not respond, and their nonresponse is related to the variable of interest.
Response Bias: When respondents give inaccurate answers due to question wording, interviewer influence, or social desirability.
Selection Bias: When the method of selecting the sample causes it to differ from the population.

Example: Surveying only daytime shoppers at a mall may exclude working individuals, introducing sampling bias.

Chapter 2: Organizing and Summarizing Data

Reading and Understanding Histograms

Histograms are graphical representations of the distribution of numerical data.

Each bar represents the frequency of data within a specific interval (bin).
The height of the bar indicates the number of observations in that interval.

Identifying the Shape of a Distribution

Symmetric: Both sides of the histogram are approximately mirror images.
Skewed Right (Positive Skew): The right tail is longer; mean > median.
Skewed Left (Negative Skew): The left tail is longer; mean < median.
Uniform: All intervals have roughly the same frequency.

Misleading Graphs and How to Fix Them

Graphs can mislead by using inappropriate scales, omitting baselines, or distorting axes.
To fix: Use consistent scales, start axes at zero, and avoid 3D effects that obscure data.

Example: A bar graph with a truncated y-axis exaggerates differences between groups.

Chapter 3: Numerically Summarizing Data

Comparing and Contrasting Normal Curves

Normal curves are bell-shaped and symmetric. They are defined by their mean (center) and standard deviation (spread).

Larger Mean: Shifts the curve horizontally.
Larger Standard Deviation: Makes the curve wider and flatter.

Empirical Rule (68-95-99.7 Rule)

The empirical rule describes the spread of data in a normal distribution:

About 68% of data falls within 1 standard deviation of the mean.
About 95% within 2 standard deviations.
About 99.7% within 3 standard deviations.

Formula:

Five Number Summary

The five number summary consists of:

Minimum
First Quartile (Q1)
Median (Q2)
Third Quartile (Q3)
Maximum

To determine by hand, order the data and find the quartiles and median.

Calculating Outliers Using the 1.5 IQR Rule

Outliers are values that fall outside the typical range of the data.

Calculate the interquartile range (IQR):
Lower bound:
Upper bound:
Any value outside these bounds is considered an outlier.

Interpreting Percentiles and Quartiles

Percentile: The value below which a given percentage of observations falls.
Quartiles: Q1 (25th percentile), Median (50th percentile), Q3 (75th percentile).

Comparing and Contrasting Boxplots

Boxplots visually display the five number summary.
They help compare distributions, spot outliers, and assess symmetry or skewness.

Location of Median, Q1, and Q3

Median divides the data into two equal halves.
Q1 is the median of the lower half; Q3 is the median of the upper half.

Chapter 4: Describing the Relation Between Two Variables

Correlation Coefficient (r)

The correlation coefficient measures the strength and direction of a linear relationship between two variables.

Values range from -1 (perfect negative) to +1 (perfect positive).
r ≈ 0 indicates no linear relationship.

Estimating r: From a scatter plot, assess the direction and tightness of the points around a line.

Coefficient of Determination (R2)

R2 indicates the proportion of variance in the dependent variable explained by the independent variable.

Ranges from 0 to 1.
Higher values indicate a stronger linear relationship.

Interpreting Slope and Y-Intercept

Slope (b): The change in the response variable for a one-unit increase in the explanatory variable.
Y-intercept (a): The predicted value when the explanatory variable is zero.

Example: In , the slope is 2, and the y-intercept is 5.

Calculating Predicted and Residual Values

Predicted Value: Substitute x into the regression equation.
Residual:

Extrapolation

Predicting values outside the range of observed data.
Can be unreliable as the relationship may not hold beyond the data range.

Contingency Tables and Probability Calculations

Contingency tables display the frequency distribution of variables.

	Category A	Category B	Total
Group 1	n11	n12	n1.
Group 2	n21	n22	n2.
Total	n.1	n.2	n

Marginal Probability: Probability of a single event, found in the margins (totals).

Conditional Probability: Probability of one event given another has occurred.

Chapter 5: Probability

General Addition Rule

Used to find the probability that at least one of two events occurs.

General Multiplication Rule

Used to find the probability that both events occur.

Conditional Probability

The probability of event A given that event B has occurred.

Sample Space

The set of all possible outcomes in a probability experiment.

Example: Flipping two coins: {HH, HT, TH, TT}

Calculating Probabilities from a Contingency Table

Use the counts in the table to find probabilities of events, intersections, and unions.
Marginal probabilities use row or column totals; conditional probabilities use appropriate cell and marginal totals.