ST231W26 Midterm 2 Study Guide: Plots, Sampling Distributions, Hypothesis Testing, and Confidence Intervals

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Plots and Numbers

Population Parameters vs. Sample Parameters

Understanding the distinction between population and sample parameters is fundamental in statistics. Population parameters describe characteristics of the entire group under study, while sample parameters are estimates based on a subset of that group.

Population Parameter: A fixed value that describes a characteristic of the whole population (e.g., population mean μ, population standard deviation σ).
Sample Parameter (Statistic): A value calculated from sample data (e.g., sample mean \( \bar{x} \), sample standard deviation s).
Example: If you measure the heights of all students in a university, the average height is a population parameter. If you measure only a random sample, the average is a sample parameter.

Types of Data

Data can be classified into several types, each requiring different statistical treatment and visualization methods.

Continuous: Numeric values that can take any value within a range (e.g., height, weight).
Discrete: Numeric values that are countable (e.g., number of students).
Unordered Categorical: Categories without inherent order (e.g., colors).
Ordered Categorical: Categories with a logical order (e.g., rating scales).
Binary: Only two possible values (e.g., yes/no, success/failure).
Example: Survey responses (agree/disagree) are binary; age is continuous.

Shapes of Distributions

The shape of a distribution provides insight into the data's characteristics and guides the choice of appropriate plots.

Skewness: Indicates asymmetry. Right-skewed (long tail to the right), left-skewed (long tail to the left).
Modality: Number of peaks (unimodal, bimodal, multimodal).
Center: Measures of central tendency (mean, median).
Spread: Measures of variability (standard deviation, IQR).
Appropriate Plots:
- Bar plots: Categorical data
- Histograms: Continuous or discrete numeric data
Difference between Bar Plots and Histograms: Bar plots display frequencies of categories; histograms show frequencies of numeric intervals.

Mean and Median

The mean and median are measures of central tendency, but they respond differently to outliers and skewness.

Mean: Arithmetic average; sensitive to outliers.
Median: Middle value; robust to outliers.
Skewness and Mean vs. Median: In a right-skewed distribution, mean > median; in a left-skewed, mean < median.
Example: Incomes in a population: mean is pulled up by a few very high incomes, median is not.

Standard Deviation and Interquartile Range (IQR)

Standard deviation and IQR are measures of spread, quantifying variability in data.

Standard Deviation (SD): Measures average distance from the mean.
Interquartile Range (IQR): Difference between the 75th and 25th percentiles.
Definition of Outliers (IQR): Outliers are values below or above .

Sampling Distributions

Definition and Properties

A sampling distribution is the probability distribution of a statistic (e.g., mean) calculated from repeated samples of the same size from a population.

Sampling Distribution: Distribution of a statistic over many samples.
"Skinnier" than Population Distribution: The standard deviation of the sampling distribution (standard error) is smaller than the population standard deviation.
Standard Error:
Example: If population SD is 10 and sample size is 25, SE is .

Central Limit Theorem (CLT)

The CLT states that, under certain conditions, the sampling distribution of the sample mean approaches a normal distribution as sample size increases.

Conditions: Large sample size (usually ), independent samples, population not extremely skewed.
Exact Distribution: For normal populations, sample mean is always normal; for non-normal, CLT applies as increases.
Why Use : The variability of the sample mean decreases as sample size increases.
Probability Calculations: Use normal distribution to calculate probabilities for sample means.

Hypothesis Tests and Confidence Intervals

Hypothesis Tests

Hypothesis testing is a formal procedure for comparing observed data to a claim about a population.

Null Hypothesis (): The default claim (e.g., ).
Alternative Hypothesis (): The competing claim (e.g., ).
Standard Error: Used to calculate test statistics.
Test Statistic ():
p-value: Probability of observing data as extreme as sample, assuming is true.
Critical Value: Threshold for significance (e.g., ).
Conclusion: State hypotheses, results, and interpret in context.
Example: Testing if average test score differs from 70: , .

Confidence Intervals

A confidence interval estimates a population parameter with a range of values, reflecting uncertainty due to sampling.

Definition: Interval likely to contain the true parameter with specified confidence (e.g., 95%).
Calculation:
Margin of Error:
Connection to p-value: If value is outside CI, p-value < .
Interpretation: "We are 95% confident the true mean lies in this interval."
Reducing Margin of Error: Increase sample size or decrease confidence level.

Inference Topics

Statistical inference involves understanding errors, power, sample size, and significance levels.

Type 1 Error: Rejecting when it is true; probability is .
Type 2 Error: Failing to reject when is true; related to power.
Power: Probability of correctly rejecting ; probability of Type 2 error.
Sample Size Calculation: Use formulas to determine required for desired margin of error.
Rounding Sample Size: Always round up to ensure desired precision.
Multiple Comparisons: Increases risk of Type 1 error; adjust accordingly.
Choosing : Common values are 0.05 or 0.01; depends on context.

Oddballs

Special Considerations

Some statistical concepts require careful interpretation or have unique properties.

Outliers: Can affect test statistics and confidence intervals, especially mean-based methods.
Sampling Distribution for Bounds: Lower bound of CI does not have a sampling distribution; p-values, median, IQR, significance level, and sample size are not random variables.
Plots for Sampling Distribution: Use histograms or density plots to visualize sampling distributions.
Mean vs. Median: If sampling distribution is normal, mean and median are equal due to symmetry.