Skip to main content
Back

Essential Study Notes for Introductory Statistics: Data Collection, Description, and Probability

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Collecting Data in Statistics

Statistics: Understanding Variability

Statistics is the science focused on collecting, analyzing, and drawing conclusions from data. The central theme is variability, which refers to differences observed in data values.

  • Population: The entire set of subjects or objects of interest.

  • Sample: A subset of the population selected for study.

  • Observational Study: The researcher observes characteristics of a sample from one or more populations without intervention.

  • Experiment: The researcher manipulates conditions to observe effects on a response variable.

Types of Statistical Studies

  • Observational Study: Used to learn about populations by observing samples. Requires representative sampling.

  • Experiment: Used to investigate effects of treatments or conditions. Requires comparable experimental groups.

Sampling Methods

  • Simple Random Sample: Every possible sample of size n has an equal chance of being selected.

  • Sampling with Replacement: Selected individuals are returned to the population before the next selection.

  • Sampling without Replacement: Selected individuals are not returned, ensuring distinct selections.

Describing Data with Tables and Graphs

Types of Data and Variables

  • Univariate: One variable per observation.

  • Bivariate: Two variables per observation.

  • Multivariate: More than two variables per observation.

  • Categorical Variables: Qualitative, such as color or gender.

  • Numerical Variables: Quantitative, either discrete (counted) or continuous (measured).

Graphical Displays for Categorical Data

  • Bar Chart: Visualizes frequency or relative frequency of categories.

  • Pie Chart: Represents proportions of categories as slices of a circle.

Example: Survey on which fictional character most needs life insurance, visualized as a pie chart.

Pie chart of fictional character life insurance needs

Graphical Displays for Numerical Data

  • Stem-and-Leaf Display: Compact summary for small to moderate data sets.

  • Histogram: Graph of frequency distribution for discrete or continuous numerical data.

Numerical Methods for Describing Data Distributions

Measures of Center

Measures of center describe the typical value in a data set.

  • Mean: Arithmetic average.

  • Median: Middle value when data are ordered.

  • Mode: Most frequently occurring value.

Measures of Spread

Measures of spread describe variability in the data.

  • Range: Difference between largest and smallest values.

  • Variance: Average squared deviation from the mean.

  • Standard Deviation: Square root of variance.

Example: Three data sets with the same mean but different variability.

Dot plots of three data sets with equal means but different spreads

Quartiles and Interquartile Range (IQR)

  • Lower Quartile (Q1): 25th percentile

  • Median (Q2): 50th percentile

  • Upper Quartile (Q3): 75th percentile

  • Interquartile Range:

Quartiles and median on a distribution

Mean, Median, and Skewness

The relationship between mean and median indicates skewness:

  • Symmetric Distribution: Mean = Median

  • Positively Skewed: Mean > Median

  • Negatively Skewed: Mean < Median

Mean and median in symmetric and skewed distributions

Measures of Relative Standing: z-scores

  • z-score: Number of standard deviations a value is from the mean.

The Empirical Rule

For mound-shaped, symmetric distributions:

  • 68% of values within 1 standard deviation

  • 95% within 2 standard deviations

  • 99.7% within 3 standard deviations

Empirical rule for normal distribution

Describing Bivariate Numerical Data

Scatterplots

Scatterplots display relationships between two numerical variables. Patterns may be linear or nonlinear, positive or negative, or show no relationship.

Correlation

  • Pearson's Correlation Coefficient (r): Measures strength and direction of linear relationship.

  • r ranges from -1 (perfect negative) to +1 (perfect positive).

Correlation strength scale

Regression: Fitting a Line to Bivariate Data

Linear Regression Model

  • Regression Equation:

  • Slope (b): Change in y for a one-unit increase in x.

  • Intercept (a): Value of y when x = 0.

  • Residual: Difference between observed and predicted y.

Population regression line with deviations

Coefficient of Determination (R2)

  • Proportion of variation in y explained by the model.

Probability

Chance Experiments and Sample Spaces

  • Chance Experiment: Activity with uncertain outcome.

  • Sample Space: Set of all possible outcomes.

  • Event: Any collection of outcomes.

  • Simple Event: Event with exactly one outcome.

Basic Probability Rules

  • Probability of an event:

  • 0 ≤ P(E) ≤ 1

  • P(Sample Space) = 1

  • If events are disjoint:

  • P(E) + P(Ec) = 1

Conditional Probability

  • Probability of E given F:

Independence

  • Events A and B are independent if

  • Multiplication rule for independent events:

Venn Diagrams and Probability

Venn diagrams are used to visualize relationships between events, such as union, intersection, and disjointness.

Venn diagrams for probability events Venn diagram for three events Venn diagram for three events

General Addition Rule

  • For any two events:

Venn diagram for addition rule

General Multiplication Rule

  • For any two events:

Venn diagram for intersection

Additional info: These notes cover the foundational concepts in statistics, including data collection, graphical and numerical data description, bivariate analysis, and probability. All images included are directly relevant to the explanations provided.

Pearson Logo

Study Prep