Basic Concepts and Methods in Statistics: Week 1 Study Guide

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Basic Concepts of Statistics

What is Statistics?

Statistics is the science concerned with developing and studying methods for collecting, analyzing, interpreting, and presenting empirical data. The "information" that is collected, organized, and analyzed in statistics is referred to as data.

Data can be numerical (quantitative) or non-numerical (qualitative).
Statistics helps us make sense of data and draw conclusions about populations based on samples.

Statistical Significance vs Practical Significance

Statistical significance refers to the likelihood that a result or relationship is caused by something other than random chance. Practical significance considers whether the result is large enough to be meaningful in the real world.

Statistical significance is often determined by p-values and hypothesis testing.
Practical significance asks whether the effect size is large enough to matter.
Beware of "p-hacking" or manipulating data to find significance.
Summary statistics (like averages) can be misleading without context.

Examples of Statistical Statements

NFL Salary stats: Outliers and averages can highlight disparities.
Health statistics: Descriptive statistics summarize large populations, such as average salaries or educational attainment among nurses.

Statistical and Critical Thinking

The Who, What, Why, and How

Critical thinking in statistics involves understanding the context of data collection, the population under study, and the methods used to analyze data.

Statistical thinking diagram

What Makes a Good Statistical Question?

A good statistical question is clear, focused, and can be answered with data. It should address variability and be relevant to the population of interest.

Person thinking about statistical questions

Key Statistical Terms

Population, Sample, and Individual

The population is the entire group under study. An individual is a member of the population. A sample is a subset of the population.

Census: Collecting data from every member of the population.
Sample: Collecting data from a subset of the population.

Population and sample diagram

Descriptive vs Inferential Statistics

Descriptive statistics involves organizing, summarizing, and presenting data. Inferential statistics uses methods to make generalizations from a sample to a population and measure reliability.

Descriptive: Tables, graphs, numerical summaries.
Inferential: Estimation, hypothesis testing.

Tables and graphs examples Population to sample inference diagram

Statistic vs Parameter

A statistic is a numerical summary of a sample. A parameter is a numerical summary of a population.

Example: Average weekly spending from a sample is a statistic; from the entire population, it is a parameter.

Variables in Statistics

Variables are characteristics of individuals within the population. They can be classified as qualitative or quantitative.

Qualitative variables: Descriptive, often non-numerical.
Quantitative variables: Numerical, can be manipulated mathematically.

Qualitative data examples Quantitative data examples

Levels of Measurement

Nominal, Ordinal, Interval, Ratio

Variables can be classified by their level of measurement:

Nominal: Categories only, no order (e.g., car model).
Ordinal: Categories with order (e.g., language ability: beginner, intermediate, fluent).
Interval: Ordered, differences have meaning, no true zero (e.g., temperature in Celsius).
Ratio: Ordered, differences and ratios have meaning, true zero (e.g., height).

Discrete vs Continuous Variables

Discrete variables have a finite or countable number of values. Continuous variables have an infinite number of possible values within a range.

Discrete: Number of students in a class.
Continuous: Height, weight, spending.

Collecting Data: Methods and Designs

Observational Studies vs Experiments

Data can be collected through observational studies (no intervention) or experiments (apply treatment and observe effects).

Observational study: Observe and measure characteristics without modifying subjects.
Experiment: Apply treatment and observe effects.

Confounding and Lurking Variables

Confounding occurs when it is unclear which factor caused an observed effect. Lurking variables are hidden variables that influence the outcome.

Design of Experiments

Replication: Repeating experiments on multiple subjects.
Blinding: Subjects do not know if they receive treatment or placebo.
Double-blind: Both subjects and experimenters are unaware of treatment assignment.
Randomization: Assign subjects to groups by chance.

Sampling Methods

Simple Random Sample

Every possible sample of size n has the same chance of being chosen.

Systematic Sampling

Select a starting point and then every kth element in the population.

Systematic sampling example

Convenience Sampling

Use data that are easy to obtain, often leading to bias.

Convenience sampling example

Stratified Sampling

Divide the population into subgroups (strata) and sample from each subgroup.

Stratified sampling example

Cluster Sampling

Divide the population into clusters, randomly select clusters, and sample all members within selected clusters.

Cluster sampling example

Multistage Sampling

Combine several sampling methods in stages.

Types of Observational Studies

Cross-sectional study: Data collected at one point in time.
Retrospective (case-control) study: Data collected from past records.
Prospective (cohort) study: Data collected in the future from groups sharing common factors.

Controlling Effects of Variables

Completely Randomized Design: Subjects assigned to groups randomly.
Randomized Block Design: Subjects grouped by similar characteristics, then randomly assigned treatments within blocks.
Matched Pairs Design: Subjects matched in pairs based on similarity, then assigned to different treatments.
Rigorously Controlled Design: Subjects carefully assigned to ensure similarity across treatment groups.

Sampling Errors and Bias

Sampling Error

Occurs due to random sample fluctuations; difference between sample result and true population result.

Nonsampling Error

Results from human error, biased questions, incorrect data entry, or inappropriate statistical methods.

Nonrandom Sampling Error

Occurs when nonrandom methods (e.g., convenience sampling) are used.

Sources of Bias in Sampling

Undercoverage: Some population members are inadequately represented.
Non-response: Selected individuals do not respond.
Response bias: Respondents provide inaccurate answers.
Voluntary response bias: Only those with strong opinions respond.

Sampling bias vs representative sample

Summary Table: Levels of Measurement

Level	Description	Example
Nominal	Categories only	Car model
Ordinal	Categories with order	Language ability
Interval	Differences, no true zero	Temperature (Celsius)
Ratio	Differences and true zero	Height

Review Questions

What is the difference between a population and a sample?
What are the characteristics of a good statistical question?
What is the difference between categorical and quantitative variables?
What is an observational unit?
What is the difference between a parameter and a statistic?
Describe how each sampling method could be used to pull a random sample of 5 students from class.