Skip to main content
Back

Comprehensive Study Notes for Introductory Statistics (Chapters 1–8, 10)

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Chapter 1: Introduction to Statistics

Populations, Samples, Parameters, and Statistics

Statistics is the science of collecting, analyzing, and interpreting data. Understanding the foundational terms is essential for all subsequent topics.

  • Population: The entire group of individuals or items under study.

  • Sample: A subset of the population, selected for analysis.

  • Parameter: A numerical summary describing a characteristic of a population (e.g., population mean μ).

  • Statistic: A numerical summary describing a characteristic of a sample (e.g., sample mean \bar{x}).

  • Quantitative Data: Data that are numerical and can be measured (e.g., height, weight).

  • Categorical Data: Data that represent categories or labels (e.g., gender, color).

Example: If a study measures the ages of 30 randomly selected residents, the group of all residents is the population, the 30 selected are the sample, the average age of all residents is a parameter, and the average age of the sample is a statistic.

Observational Studies vs. Experiments

  • Observational Study: Researchers observe subjects without manipulating variables.

  • Experiment: Researchers apply treatments and observe effects on subjects.

Example: Measuring blood pressure before and after administering a drug is an experiment; recording blood pressure without intervention is an observational study.

Chapter 2: Exploring Data with Tables and Graphs

Correlation and Regression (Preview)

Correlation and regression analyze relationships between two quantitative variables. This topic is revisited in detail in Chapter 10.

  • Correlation: Measures the strength and direction of a linear relationship between two variables.

  • Regression: Models the relationship to predict one variable based on another.

Chapter 3: Describing, Exploring, and Comparing Data

Measures of Central Tendency

Central tendency describes the center of a data set.

  • Mean (\bar{x}): The arithmetic average of data values.

  • Median: The middle value when data are ordered.

Example: For data 2, 4, 6, 8, 10: Mean = (2+4+6+8+10)/5 = 6; Median = 6.

Statistical Symbols and Their Classification

  • n: Sample size (statistic)

  • N: Population size (parameter)

  • \bar{x}: Sample mean (statistic)

  • μ: Population mean (parameter)

  • s: Sample standard deviation (statistic)

  • σ: Population standard deviation (parameter)

  • s^2: Sample variance (statistic)

  • σ^2: Population variance (parameter)

Example: "From a sample of 30 residents, the mean age was 61" — 30 is n, 61 is \bar{x}.

Chapter 4: Probability

General Properties of Probability

Probability quantifies the likelihood of events.

  • Probabilities are always between 0 and 1.

  • The sum of probabilities for all possible outcomes equals 1.

  • Simple Event: An event with a single outcome.

  • Compound Event: An event with two or more outcomes.

Sample Space and Probability Calculation

  • Sample Space (S): The set of all possible outcomes of an experiment.

  • Probability of an Event (A):

Example: Rolling a die: S = {1,2,3,4,5,6}; Probability of rolling an even number = 3/6 = 0.5.

Chapter 5: Discrete Probability Distributions

Random Variables

  • Random Variable (X): A variable whose value is determined by the outcome of a random experiment.

  • Discrete Random Variable: Takes countable values (e.g., number of heads in 3 coin tosses).

  • Continuous Random Variable: Takes any value in an interval (e.g., height, weight).

Probability Distribution Table

For a discrete random variable, the probability distribution lists all possible values and their probabilities. The sum of all probabilities must be 1. If a value is missing, it can be found by subtracting the sum of known probabilities from 1.

Chapter 6: Normal Probability Distributions

Normal and Uniform Distributions

  • Normal Distribution: Symmetric, bell-shaped curve; characterized by mean μ and standard deviation σ.

  • Standard Normal Distribution: Special case with μ = 0 and σ = 1.

  • Uniform Distribution: All outcomes equally likely within an interval [a, b].

Uniform Distribution Calculations

  • Height of Uniform Distribution: for

  • Probability:

Normal Distribution Calculations

  • Z-score:

  • Probability Calculations: Use z-tables or calculators to find , , .

  • Z-critical value (z_\alpha): The z-score with area α to its right under the standard normal curve.

Unbiased and Biased Estimators

An unbiased estimator is a statistic whose expected value equals the parameter it estimates. Unbiased estimators are preferred for inference.

Statistic

Estimator Type

Parameter Estimated

\bar{x}

Unbiased

μ

\hat{p}

Unbiased

p

s^2

Unbiased

σ^2

s

Biased

σ

Sample Median

Biased

Population Median

Sample Range

Biased

Population Range

Central Limit Theorem (CLT)

The CLT states that for a sufficiently large sample size (n > 30), the sampling distribution of the sample mean is approximately normal, regardless of the population's distribution.

  • Mean of sampling distribution:

  • Standard deviation (standard error):

  • CLT applies if the population is normal or n > 30.

Chapter 7: Estimating Parameters and Determining Sample Sizes

Confidence Intervals (CI)

A confidence interval estimates a population parameter with an associated confidence level (e.g., 95%). It consists of a point estimate plus or minus a margin of error (E).

  • Confidence Level: The probability that the interval contains the true parameter value.

  • Common Confidence Levels: 90%, 95%, 99%.

Confidence Interval Formulas

  • For Population Proportion (p): , where

  • For Population Mean (μ), σ known: , where

  • For Population Mean (μ), σ unknown: , where

Z-critical Values for Common Confidence Levels

Confidence Level

zα/2

90%

1.645

95%

1.96

99%

2.575

t-distribution

  • Used when σ is unknown and the sample is from a normal population.

  • Symmetric and bell-shaped for each n.

  • As n increases, the t-distribution approaches the standard normal distribution.

  • t-critical values are found using degrees of freedom (n-1) and the desired tail area.

Chapter 8: Hypothesis Testing

General Procedure

  1. State the null hypothesis (H0) and alternative hypothesis (Ha).

  2. Identify and compute the appropriate test statistic.

  3. Use the P-value method: compare the P-value to α (significance level) to decide whether to reject H0.

  4. Contextualize the decision in terms of the original problem.

Type I and Type II Errors

  • Type I Error (α): Rejecting H0 when it is true (false positive).

  • Type II Error (β): Failing to reject H0 when it is false (false negative).

Example: In a criminal trial, a Type I error is convicting an innocent person; a Type II error is acquitting a guilty person.

Critical Region and Test Statistic

If the test statistic falls in the critical region, reject H0; otherwise, do not reject H0.

Chapter 10: Correlation and Regression

Correlation

  • Correlation: Measures the strength and direction of a linear relationship between two variables.

  • Linear Correlation Coefficient (r): Ranges from -1 (perfect negative) to +1 (perfect positive); 0 indicates no linear correlation.

Example: A scatterplot with points rising from left to right shows positive correlation; falling shows negative correlation.

Scatterplots

  • Graphical representation of paired data (X, Y).

  • Used to visually assess the relationship between variables.

Regression Line

  • Sample Regression Line:

  • Used to predict the value of y for a given x.

Example: If , then for x = 4, predicted y = 14.

Properties of r

  • r is unitless.

  • r is sensitive to outliers.

  • r only measures linear relationships.

Additional info: For more advanced inference topics (e.g., two-sample inference, ANOVA, chi-square tests), refer to later chapters not covered in this review.

Pearson Logo

Study Prep