BackChapter 7: Survey Sampling and Inference – Study Notes
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Survey Sampling and Inference
Learning Objectives
Estimate a population proportion from a sample proportion and quantify the likely error.
Understand how random sampling reduces bias.
Apply the Central Limit Theorem (CLT) for sample proportions to approximate probabilities.
Find, interpret, and use confidence intervals for a single population proportion.
7.1 Learning about the World through Surveys
Idea of Sampling: Examine a Part of the Whole
Population: The entire group of individuals we want to study.
Sample: A smaller group selected from the population for study.
Goal: Learn about the population by studying the sample.
Problem: It is usually impractical or impossible to collect data on the entire population.
Compromise: Use a sample to make inferences about the population.
Parameter and Statistic
Parameter: A numerical value that describes a characteristic of the population (e.g., population mean, population proportion).
Statistic: A numerical value calculated from the sample data (e.g., sample mean, sample proportion).
Statistics are used to estimate parameters.
Survey Terminology
Population: The group of interest.
Sample: The subset of the population actually observed or measured.
Parameter: A fixed (but usually unknown) value describing the population.
Statistic (Estimator): A value calculated from the sample, used to estimate the parameter.
Census: A survey in which every member of the population is measured.
Term | What It Refers To | Size | Example |
|---|---|---|---|
Population | The entire group being studied | Usually large | All U.S. first ladies |
Sample | A subset of the population | Smaller | 10 first ladies chosen for a research study |
Parameter | A numerical value that describes a population | Fixed but usually unknown | The true average age of all U.S. first ladies when they got married |
Statistic | A numerical value that describes a sample | Can vary from sample to sample | The average age of the 10 sampled first ladies |
Statistical Inference
Statistical inference: Drawing conclusions about a population based on a sample.
Always involves uncertainty; measuring this uncertainty is a key part of statistics.
Example: Survey of College Students
Population: All US college students.
Sample: 1000 surveyed students.
Parameter of interest: Proportion of all US college students who study alone.
Statistic: (proportion in the sample who study alone).
Statistical inference: Estimate that 42% of all US college students prefer to study alone.
Concept | Parameter | Statistic |
|---|---|---|
Definition | A numerical value that describes a population | A numerical value that describes a sample |
Scope | Entire population | Subset (sample) of the population |
Known or Unknown? | Usually unknown | Usually known |
Example: Genetically Modified Foods
Population: All American adults.
Sample: 2002 American adults surveyed.
Parameter: Percentage of all American adults who believe GMOs are safe to eat.
Statistic: 37% (percentage of the sample who felt this way).
Statistics vs. Parameters
Statistics are knowable from data; parameters are typically unknown and estimated using statistics.
Estimates involve uncertainty.
Notation
Statistics (Sample) | Parameters (Population) |
|---|---|
Sample mean | Population mean |
Sample standard deviation | Population standard deviation |
Sample variance | Population variance |
Sample proportion | Population proportion |
Bias in Sampling
Types of Bias
Bias: A method is biased if it tends to produce an untrue value.
Sampling Bias: Occurs when the sample is not representative of the population (e.g., convenience sampling, voluntary response sampling).
Measurement Bias: Results from survey questions that do not produce true answers (e.g., confusing wording, non-neutral language, misleading questions).
Examples of Bias
Convenience sampling (e.g., sampling only friends or people nearby) is always biased.
Voluntary response samples (e.g., internet polls) are biased toward those with strong opinions.
Measurement bias can occur if people inflate or deflate their responses (e.g., income reporting).
Important Questions to Avoid Sampling Bias
What percentage of people asked actually participated?
Were any segments of the population left out?
Did the researcher choose participants, or did people self-select?
Examples: Identifying Bias
Surveying only Facebook friends (convenience bias).
Loaded questions (measurement bias).
Internet polls (voluntary response bias).
Sampling only business phone numbers (sampling bias).
Surveying only grocery store shoppers (sampling bias).
Random Sampling
Simple Random Sampling (SRS)
Every member of the population and every possible sample is equally likely to be chosen.
Sampling is done without replacement (once selected, an individual cannot be selected again).
Example: Selecting an SRS
Assign numbers to each individual.
Use a random process (e.g., random digits) to select the sample.
Name | Number |
|---|---|
Alberto | 1 |
Justin | 2 |
Michael | 3 |
Audrey | 4 |
Brandy | 5 |
Nicole | 6 |
Random digits are used to select three names, ensuring each possible group of three is equally likely.
7.2 Measuring the Quality of a Survey
Evaluating Surveys
Statisticians evaluate the method used, not the outcome of a single survey.
Results will vary from sample to sample; focus is on the estimation method's reliability.
The Goal: Accuracy and Precision
Accuracy: The method measures what it is intended to measure (low bias).
Precision: The method yields consistent results when repeated (low variability/standard error).
Concept | Meaning | Measured by | Ideal Outcome |
|---|---|---|---|
Accuracy | Closeness to the true value | Bias (difference between estimate and true parameter) | Low bias |
Precision | Consistency of repeated measurements | Variability (standard deviation or standard error) | Low variability |
Visualizing Accuracy and Precision
Accurate and precise: Estimates are close to the true value and to each other.
Accurate but not precise: Estimates are close to the true value on average, but vary widely.
Precise but not accurate: Estimates are consistent but far from the true value.
Neither accurate nor precise: Estimates are inconsistent and far from the true value.