MTH161: Review of Statistics I – Key Concepts, Distributions, and Proportion Inference

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Statistics and Critical Thinking

Sampling Variability and Systematic Bias

Understanding the sources of error and bias in data collection is fundamental to statistical reasoning. Sampling variability refers to the natural differences that arise when different samples are drawn from the same population. Systematic bias occurs when a sample is collected in a way that consistently misrepresents the population.

Incorrect Conclusions: Drawing conclusions from biased samples can lead to invalid results.
Natural Bias: Bias inherent in the sampling process or population.
Nonresponse Bias: Occurs when certain groups do not respond to surveys, skewing results.
Interviewer Bias: The interviewer’s behavior or questions influence responses.
Volunteer Bias: Individuals who volunteer for studies may differ from the general population.

Example:

High school students reporting on alcohol consumption may underreport due to social desirability.
Surveying only certain groups (e.g., women vs. men) can introduce bias in results.

Parameters and Statistics

Definitions and Distinctions

A parameter is a numerical summary of a population, while a statistic is a numerical summary of a sample.

Parameter: The mean salary of all teachers in a district.
Statistic: The mean salary of a sample of teachers.

Example:

"72% of all seniors at a high school are taking a math course" is a parameter if it refers to the entire population.
"58% of students sampled think MCC should have one break" is a statistic.

The Standard Normal Distribution

Key Concepts and Properties

The standard normal distribution is a special case of the normal distribution with mean 0 and standard deviation 1. It is used to model many natural phenomena and is foundational for inferential statistics.

The graph is symmetric and bell-shaped.
Mean () = 0, Standard deviation () = 1.
Area under the curve = 1.
There is a direct correspondence between area and probability.
Approximately 68% of data falls within 1 SD, 95% within 2 SD, and 99.7% within 3 SD of the mean.

Equation:

Finding Probabilities Using z-Scores

Probabilities for normal distributions are found using z-scores and standard normal tables.

To find , use the cumulative area from the left in the table.
To find , subtract the cumulative area from 1.

Example Table:

z	Area from Left
-1.00	0.1587
0.00	0.5000
1.00	0.8413

Example:

Find : Look up 1.92 in the table to get the area.
Find : Find area for -0.56 and subtract from 1.

Estimating a Population Proportion

Binomial Probability of Success

Estimating proportions is common in statistics, especially when outcomes are binary (success/failure).

Sample observations must be random and independent.
If and , the binomial distribution can be approximated by a normal distribution.

Equation:

Point Estimate and Confidence Interval

The point estimate for a population proportion is the sample proportion (). A confidence interval provides a range of plausible values for the population proportion.

Margin of Error Formula:

Procedure:

Verify requirements (random sample, binary outcome).
Find the critical value ().
Calculate point estimate and margin of error.
Construct confidence interval: .

Example:

Survey of 300 students, 210 play summer soccer. Find 95% confidence interval for proportion.

Determining Sample Size

Sample Size for Proportion Estimates

To achieve a desired margin of error and confidence level, use:

Example:

Estimate sample size needed for 95% confidence and margin of error 3%.

Hypothesis Testing

Basics of Hypothesis Testing

Hypothesis testing is used to make inferences about population parameters.

Null Hypothesis (): The default assumption (e.g., ).
Alternative Hypothesis (): The claim to be tested (e.g., ).

Test Statistic:

Decision Rule:

If -value < significance level (), reject .

Example:

Test if the proportion of Americans using Spotify exceeds 61%.

Calculating Beta and Power

Type I and Type II Errors

In hypothesis testing, two types of errors can occur:

Type I Error (): Rejecting when it is true.
Type II Error (): Failing to reject when it is false.

Power of a Test:

Power =

Example Table:

Outcome	Decision	Reality
Type I Error	Reject	true
Type II Error	Fail to reject	false

Sample Size for Hypothesis Tests

Calculating Sample Size for Desired Power

To ensure a hypothesis test has sufficient power, calculate the required sample size using:

Example:

Determine sample size needed to detect a difference in proportions with specified and .

Using StatCrunch for Statistical Analysis

Performing Proportion Tests and Sample Size Calculations

StatCrunch is a statistical software tool used for hypothesis testing and confidence interval calculations.

Enter summarized data and select the appropriate test.
Choose the correct alternative hypothesis.
Review output for point estimate, test statistic, and p-value.

Example:

Test the claim that 65% of students prefer two breaks using StatCrunch.

Additional info: These notes cover foundational topics in introductory statistics, including sampling, estimation, hypothesis testing, and the use of statistical software. All formulas are presented in LaTeX format for clarity and academic rigor.