Skip to main content
Back

STAT 2040 Study Notes: Chapters 1-4 (Sampling, Descriptive Statistics, Probability, Discrete Random Variables)

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Introduction to Statistics

Overview of Statistics

Statistics is the science of collecting and analyzing data to learn about unknown truths. It is used to answer questions about populations, relationships, and effects by applying statistical inference techniques.

  • Key Questions:

    • Can post-menopausal women lower their heart attack risk by undergoing hormone replacement therapy?

    • Is there a relationship between income and happiness?

    • What variables impact length of stay in hospital after a Caesarean section birth?

  • Statistical Inference: Uses data to answer questions and draw conclusions about populations.

Descriptive Statistics

Summarizing Data

Descriptive statistics provide ways to summarize and describe the main features of a data set, using both visual and numerical methods.

  • Plots: Visual summaries of data (e.g., histograms, boxplots).

  • Numerical Summaries: Include measures such as mean, median, and standard deviation.

Inferential Statistics

Statistical Inference Methods

Inferential statistics allow us to make conclusions about populations based on sample data.

  • Confidence Intervals: Estimate population parameters with a range of plausible values.

  • Hypothesis Tests: Assess evidence for or against a specific claim about a population.

  • One-way ANOVA: Compare means across multiple groups.

  • Probability: Fundamental to all inferential statistics.

Populations and Samples, Parameters and Statistics

Key Definitions

Understanding the distinction between populations, samples, parameters, and statistics is essential for statistical inference.

  • Individuals / Units / Cases: Objects on which measurements are made.

  • Population: The complete set of units of interest to an investigator. Can be finite or infinite.

  • Parameter: A numerical characteristic of a population (e.g., population mean).

  • Sample: A subset of units selected from the population.

  • Statistic: A numerical characteristic of a sample (e.g., sample mean).

  • Sample Statistics: Used to draw inferences about population parameters.

Sampling Methods

Principles of Sampling

Sampling methods affect the quality and reliability of statistical conclusions.

  • Voluntary Response Sample: Individuals choose whether to respond; often strongly biased.

  • Random Sampling: Reduces bias and increases representativeness.

Simple Random Sampling (SRS)

Each possible sample of size has an equal chance of being selected.

  • Definition varies depending on whether the population is finite or infinite.

Stratified Random Sampling

The population is divided into subgroups (strata), and random samples are drawn from each stratum.

  • Combines information from different strata using mathematical techniques.

  • Produces estimators with lower variability.

Cluster Sampling

Used when the population is divided into clusters, making it easier to sample groups rather than individuals.

  • Researchers may measure all individuals in selected clusters.

  • Sampling design can be complex, involving multiple stages.

Systematic Sampling

Units are ordered, and every th unit is selected after a random starting point.

  • If there is a periodic effect, the sample may be biased.

  • If units are distributed randomly, systematic sampling behaves like SRS.

Experiments and Observational Studies

Types of Studies

Distinguishing between experiments and observational studies is crucial for interpreting results.

  • Response Variable: The variable of interest.

  • Explanatory Variable: Explains or possibly causes changes in the response variable.

  • Observational Study: Researchers observe and measure variables without imposing conditions.

  • Experiment: Researchers impose conditions and investigate differences in response variables.

  • Confounding: Occurs when two variables' effects cannot be separated.

Drawing Causal Conclusions

  • Observational studies rarely provide strong evidence of causality.

  • Large sample sizes help estimate parameters more precisely but do not correct for bias.

  • Sampling can be difficult, expensive, or destructive; small samples can still be informative.

  • When comparing groups, sample sizes do not need to be equal.

Types of Variables and Data Summaries

Plots for Categorical Variables

  • Categorical (Qualitative) Variable: Falls into categories (e.g., blood type, province, species).

  • Frequency: Number of observations in a category.

  • Relative Frequency: Proportion of observations in a category, calculated as:

Plots for Quantitative Variables

  • Quantitative Variable: Represents measurable quantities (e.g., height, length of stay, weight).

Numerical Measures

Measures of Central Tendency

  • Summation Notation: means add up values from through .

  • Sample Mean: Average of all observations.

  • Median: Middle value when data are ordered. If is odd, median is the middle value; if is even, median is the average of the two middle values.

  • Mode: Most frequently occurring observation.

  • Geometric Mean:

  • Harmonic Mean: Reciprocal of the arithmetic mean of reciprocals:

  • Weighted Mean: Observations are given more weight in calculations.

  • Trimmed Mean: Calculated after removing a certain percentage of extreme values.

  • Midrange: Midpoint between largest and smallest observations.

Measures of Variability

  • Range: Maximum - Minimum

  • Mean Absolute Deviation (MAD): Average distance from the mean.

  • Simple Variance: Average squared distance from the mean.

  • Standard Deviation (SD): Square root of sample variance.

  • Variance and SD are zero only if all values are equal.

  • Larger variance or SD indicates more variable data.

Percentiles and Boxplots

Percentiles and Quartiles

  • Percentile: Value below which a given percentage of observations fall.

  • Quartiles: Divide data into four equal parts.

  • First quartile (): 25th percentile

  • Second quartile (): 50th percentile (median)

  • Third quartile (): 75th percentile

  • Interquartile Range (IQR):

  • Five-number summary: Minimum, , median (), , maximum

Boxplots

  • Useful for comparing groups and identifying outliers.

  • Box extends from to , with a line at the median.

  • Whiskers extend to the smallest and largest observations within of the quartiles.

  • Observations beyond whiskers are considered outliers.

Basics of Probability

Probability Concepts

Probability quantifies the likelihood of events under certain assumptions.

  • Sample Space (): Set of all possible outcomes.

  • Event: Subset of the sample space.

  • Mutually Exclusive Events: Cannot occur simultaneously;

  • Complement of Event (): Event that does not occur.

Rules of Probability

  • Addition Rule: For mutually exclusive events,

  • General Addition Rule:

  • Conditional Probability: Probability of given is

  • Law of Total Probability: If partition the sample space,

Counting Rules

  • Permutation: Ordered arrangement of items.

  • Combination: Selection of items without regard to order.

Discrete Random Variables

Definition and Properties

A discrete random variable takes on a countable number of distinct values, each with an associated probability.

  • Probability Distribution: Lists all possible values and their probabilities.

  • Expected Value (Mean):

  • Variance:

Table: Comparison of Sampling Methods

Sampling Method

Description

Advantages

Disadvantages

Simple Random Sampling (SRS)

Each sample has equal chance of selection

Unbiased, easy to analyze

May not capture subgroups well

Stratified Sampling

Population divided into strata, sample from each

Lower variability, ensures subgroup representation

Requires knowledge of strata

Cluster Sampling

Population divided into clusters, sample clusters

Efficient for large populations

Higher variability, complex analysis

Systematic Sampling

Select every k-th unit after random start

Simple, quick

Biased if periodic effects exist

Voluntary Response

Individuals choose to respond

Easy to collect

Strongly biased

Additional info:

  • Some formulas and definitions were expanded for clarity and completeness.

  • Examples and applications were inferred from context and standard statistics curriculum.

Pearson Logo

Study Prep