BackSTAT 2040 Study Notes: Chapters 1-4 (Sampling, Descriptive Statistics, Probability, Discrete Random Variables)
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Introduction to Statistics
Overview of Statistics
Statistics is the science of collecting and analyzing data to learn about unknown truths. It is used to answer questions about populations, relationships, and effects by applying statistical inference techniques.
Key Questions:
Can post-menopausal women lower their heart attack risk by undergoing hormone replacement therapy?
Is there a relationship between income and happiness?
What variables impact length of stay in hospital after a Caesarean section birth?
Statistical Inference: Uses data to answer questions and draw conclusions about populations.
Descriptive Statistics
Summarizing Data
Descriptive statistics provide ways to summarize and describe the main features of a data set, using both visual and numerical methods.
Plots: Visual summaries of data (e.g., histograms, boxplots).
Numerical Summaries: Include measures such as mean, median, and standard deviation.
Inferential Statistics
Statistical Inference Methods
Inferential statistics allow us to make conclusions about populations based on sample data.
Confidence Intervals: Estimate population parameters with a range of plausible values.
Hypothesis Tests: Assess evidence for or against a specific claim about a population.
One-way ANOVA: Compare means across multiple groups.
Probability: Fundamental to all inferential statistics.
Populations and Samples, Parameters and Statistics
Key Definitions
Understanding the distinction between populations, samples, parameters, and statistics is essential for statistical inference.
Individuals / Units / Cases: Objects on which measurements are made.
Population: The complete set of units of interest to an investigator. Can be finite or infinite.
Parameter: A numerical characteristic of a population (e.g., population mean).
Sample: A subset of units selected from the population.
Statistic: A numerical characteristic of a sample (e.g., sample mean).
Sample Statistics: Used to draw inferences about population parameters.
Sampling Methods
Principles of Sampling
Sampling methods affect the quality and reliability of statistical conclusions.
Voluntary Response Sample: Individuals choose whether to respond; often strongly biased.
Random Sampling: Reduces bias and increases representativeness.
Simple Random Sampling (SRS)
Each possible sample of size has an equal chance of being selected.
Definition varies depending on whether the population is finite or infinite.
Stratified Random Sampling
The population is divided into subgroups (strata), and random samples are drawn from each stratum.
Combines information from different strata using mathematical techniques.
Produces estimators with lower variability.
Cluster Sampling
Used when the population is divided into clusters, making it easier to sample groups rather than individuals.
Researchers may measure all individuals in selected clusters.
Sampling design can be complex, involving multiple stages.
Systematic Sampling
Units are ordered, and every th unit is selected after a random starting point.
If there is a periodic effect, the sample may be biased.
If units are distributed randomly, systematic sampling behaves like SRS.
Experiments and Observational Studies
Types of Studies
Distinguishing between experiments and observational studies is crucial for interpreting results.
Response Variable: The variable of interest.
Explanatory Variable: Explains or possibly causes changes in the response variable.
Observational Study: Researchers observe and measure variables without imposing conditions.
Experiment: Researchers impose conditions and investigate differences in response variables.
Confounding: Occurs when two variables' effects cannot be separated.
Drawing Causal Conclusions
Observational studies rarely provide strong evidence of causality.
Large sample sizes help estimate parameters more precisely but do not correct for bias.
Sampling can be difficult, expensive, or destructive; small samples can still be informative.
When comparing groups, sample sizes do not need to be equal.
Types of Variables and Data Summaries
Plots for Categorical Variables
Categorical (Qualitative) Variable: Falls into categories (e.g., blood type, province, species).
Frequency: Number of observations in a category.
Relative Frequency: Proportion of observations in a category, calculated as:
Plots for Quantitative Variables
Quantitative Variable: Represents measurable quantities (e.g., height, length of stay, weight).
Numerical Measures
Measures of Central Tendency
Summation Notation: means add up values from through .
Sample Mean: Average of all observations.
Median: Middle value when data are ordered. If is odd, median is the middle value; if is even, median is the average of the two middle values.
Mode: Most frequently occurring observation.
Geometric Mean:
Harmonic Mean: Reciprocal of the arithmetic mean of reciprocals:
Weighted Mean: Observations are given more weight in calculations.
Trimmed Mean: Calculated after removing a certain percentage of extreme values.
Midrange: Midpoint between largest and smallest observations.
Measures of Variability
Range: Maximum - Minimum
Mean Absolute Deviation (MAD): Average distance from the mean.
Simple Variance: Average squared distance from the mean.
Standard Deviation (SD): Square root of sample variance.
Variance and SD are zero only if all values are equal.
Larger variance or SD indicates more variable data.
Percentiles and Boxplots
Percentiles and Quartiles
Percentile: Value below which a given percentage of observations fall.
Quartiles: Divide data into four equal parts.
First quartile (): 25th percentile
Second quartile (): 50th percentile (median)
Third quartile (): 75th percentile
Interquartile Range (IQR):
Five-number summary: Minimum, , median (), , maximum
Boxplots
Useful for comparing groups and identifying outliers.
Box extends from to , with a line at the median.
Whiskers extend to the smallest and largest observations within of the quartiles.
Observations beyond whiskers are considered outliers.
Basics of Probability
Probability Concepts
Probability quantifies the likelihood of events under certain assumptions.
Sample Space (): Set of all possible outcomes.
Event: Subset of the sample space.
Mutually Exclusive Events: Cannot occur simultaneously;
Complement of Event (): Event that does not occur.
Rules of Probability
Addition Rule: For mutually exclusive events,
General Addition Rule:
Conditional Probability: Probability of given is
Law of Total Probability: If partition the sample space,
Counting Rules
Permutation: Ordered arrangement of items.
Combination: Selection of items without regard to order.
Discrete Random Variables
Definition and Properties
A discrete random variable takes on a countable number of distinct values, each with an associated probability.
Probability Distribution: Lists all possible values and their probabilities.
Expected Value (Mean):
Variance:
Table: Comparison of Sampling Methods
Sampling Method | Description | Advantages | Disadvantages |
|---|---|---|---|
Simple Random Sampling (SRS) | Each sample has equal chance of selection | Unbiased, easy to analyze | May not capture subgroups well |
Stratified Sampling | Population divided into strata, sample from each | Lower variability, ensures subgroup representation | Requires knowledge of strata |
Cluster Sampling | Population divided into clusters, sample clusters | Efficient for large populations | Higher variability, complex analysis |
Systematic Sampling | Select every k-th unit after random start | Simple, quick | Biased if periodic effects exist |
Voluntary Response | Individuals choose to respond | Easy to collect | Strongly biased |
Additional info:
Some formulas and definitions were expanded for clarity and completeness.
Examples and applications were inferred from context and standard statistics curriculum.