Comprehensive Study Notes for Introductory Statistics (Chapters 1–8, 10)

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Chapter 1: Introduction to Statistics

Populations, Samples, Parameters, and Statistics

Statistics is the science of collecting, analyzing, and interpreting data. Understanding the foundational terms is essential for all subsequent topics.

Population: The entire group of individuals or items under study.
Sample: A subset of the population, selected for analysis.
Parameter: A numerical summary describing a characteristic of a population (e.g., population mean μ).
Statistic: A numerical summary describing a characteristic of a sample (e.g., sample mean \bar{x}).
Quantitative Data: Data that are numerical and can be measured (e.g., height, weight).
Categorical Data: Data that represent categories or labels (e.g., gender, color).

Example: If a study measures the ages of 30 randomly selected residents, the group of all residents is the population, the 30 selected are the sample, the average age of all residents is a parameter, and the average age of the sample is a statistic.

Observational Studies vs. Experiments

Observational Study: Researchers observe subjects without manipulating variables.
Experiment: Researchers apply treatments and observe effects on subjects.

Example: Measuring blood pressure before and after administering a drug is an experiment; recording blood pressure without intervention is an observational study.

Chapter 2: Exploring Data with Tables and Graphs

Correlation and Regression (Preview)

Correlation and regression analyze relationships between two quantitative variables. This topic is revisited in detail in Chapter 10.

Correlation: Measures the strength and direction of a linear relationship between two variables.
Regression: Models the relationship to predict one variable based on another.

Chapter 3: Describing, Exploring, and Comparing Data

Measures of Central Tendency

Central tendency describes the center of a data set.

Mean (\bar{x}): The arithmetic average of data values.
Median: The middle value when data are ordered.

Example: For data 2, 4, 6, 8, 10: Mean = (2+4+6+8+10)/5 = 6; Median = 6.

Statistical Symbols and Their Classification

n: Sample size (statistic)
N: Population size (parameter)
\bar{x}: Sample mean (statistic)
μ: Population mean (parameter)
s: Sample standard deviation (statistic)
σ: Population standard deviation (parameter)
s^2: Sample variance (statistic)
σ^2: Population variance (parameter)

Example: "From a sample of 30 residents, the mean age was 61" — 30 is n, 61 is \bar{x}.

Chapter 4: Probability

General Properties of Probability

Probability quantifies the likelihood of events.

Probabilities are always between 0 and 1.
The sum of probabilities for all possible outcomes equals 1.
Simple Event: An event with a single outcome.
Compound Event: An event with two or more outcomes.

Sample Space and Probability Calculation

Sample Space (S): The set of all possible outcomes of an experiment.
Probability of an Event (A):

Example: Rolling a die: S = {1,2,3,4,5,6}; Probability of rolling an even number = 3/6 = 0.5.

Chapter 5: Discrete Probability Distributions

Random Variables

Random Variable (X): A variable whose value is determined by the outcome of a random experiment.
Discrete Random Variable: Takes countable values (e.g., number of heads in 3 coin tosses).
Continuous Random Variable: Takes any value in an interval (e.g., height, weight).

Probability Distribution Table

For a discrete random variable, the probability distribution lists all possible values and their probabilities. The sum of all probabilities must be 1. If a value is missing, it can be found by subtracting the sum of known probabilities from 1.

Chapter 6: Normal Probability Distributions

Normal and Uniform Distributions

Normal Distribution: Symmetric, bell-shaped curve; characterized by mean μ and standard deviation σ.
Standard Normal Distribution: Special case with μ = 0 and σ = 1.
Uniform Distribution: All outcomes equally likely within an interval [a, b].

Uniform Distribution Calculations

Height of Uniform Distribution: for
Probability:

Normal Distribution Calculations

Z-score:
Probability Calculations: Use z-tables or calculators to find , , .
Z-critical value (z_\alpha): The z-score with area α to its right under the standard normal curve.

Unbiased and Biased Estimators

An unbiased estimator is a statistic whose expected value equals the parameter it estimates. Unbiased estimators are preferred for inference.

Statistic	Estimator Type	Parameter Estimated
\bar{x}	Unbiased	μ
\hat{p}	Unbiased	p
s^2	Unbiased	σ^2
s	Biased	σ
Sample Median	Biased	Population Median
Sample Range	Biased	Population Range

Central Limit Theorem (CLT)

The CLT states that for a sufficiently large sample size (n > 30), the sampling distribution of the sample mean is approximately normal, regardless of the population's distribution.

Mean of sampling distribution:
Standard deviation (standard error):
CLT applies if the population is normal or n > 30.

Chapter 7: Estimating Parameters and Determining Sample Sizes

Confidence Intervals (CI)

A confidence interval estimates a population parameter with an associated confidence level (e.g., 95%). It consists of a point estimate plus or minus a margin of error (E).

Confidence Level: The probability that the interval contains the true parameter value.
Common Confidence Levels: 90%, 95%, 99%.

Confidence Interval Formulas

For Population Proportion (p): , where
For Population Mean (μ), σ known: , where
For Population Mean (μ), σ unknown: , where

Z-critical Values for Common Confidence Levels

Confidence Level	zα/2
90%	1.645
95%	1.96
99%	2.575

t-distribution

Used when σ is unknown and the sample is from a normal population.
Symmetric and bell-shaped for each n.
As n increases, the t-distribution approaches the standard normal distribution.
t-critical values are found using degrees of freedom (n-1) and the desired tail area.

Chapter 8: Hypothesis Testing

General Procedure

State the null hypothesis (H0) and alternative hypothesis (Ha).
Identify and compute the appropriate test statistic.
Use the P-value method: compare the P-value to α (significance level) to decide whether to reject H0.
Contextualize the decision in terms of the original problem.

Type I and Type II Errors

Type I Error (α): Rejecting H0 when it is true (false positive).
Type II Error (β): Failing to reject H0 when it is false (false negative).

Example: In a criminal trial, a Type I error is convicting an innocent person; a Type II error is acquitting a guilty person.

Critical Region and Test Statistic

If the test statistic falls in the critical region, reject H0; otherwise, do not reject H0.

Chapter 10: Correlation and Regression

Correlation

Correlation: Measures the strength and direction of a linear relationship between two variables.
Linear Correlation Coefficient (r): Ranges from -1 (perfect negative) to +1 (perfect positive); 0 indicates no linear correlation.

Example: A scatterplot with points rising from left to right shows positive correlation; falling shows negative correlation.

Scatterplots

Graphical representation of paired data (X, Y).
Used to visually assess the relationship between variables.

Regression Line

Sample Regression Line:
Used to predict the value of y for a given x.

Example: If , then for x = 4, predicted y = 14.

Properties of r

r is unitless.
r is sensitive to outliers.
r only measures linear relationships.

Additional info: For more advanced inference topics (e.g., two-sample inference, ANOVA, chi-square tests), refer to later chapters not covered in this review.