Chapter 9: Sample Surveys – Principles, Methods, and Biases

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Sample Surveys

Why Do We Need Sampling?

In statistics, sampling is essential for studying large populations efficiently. A population is the complete collection of individuals under study, while a census attempts to gather data from every member of the population. However, censuses are often costly, time-consuming, or impractical. Instead, we study a sample, a subset of individuals selected from the population, to make inferences about the whole.

Population: The entire group of interest.
Sample: A subset of the population, ideally representative.
Parameter: A numerical summary describing a population (e.g., mean tuition of all students).
Statistic: A numerical summary describing a sample (e.g., mean tuition of sampled students).
Population parameter: A parameter that is part of a model for the population.
Statistics as estimates: We use statistics to estimate population parameters.

Example: To estimate how much UBC students pay for tuition, we might interview 500 students. Here, the population is all UBC students, the sample is the 500 interviewed, the parameter is the true mean tuition for all students, and the statistic is the mean tuition from the sample.

Randomization – The Key to Obtaining a Representative Sample

Randomization is crucial in sampling and inference. It helps ensure that the sample reflects the population's characteristics, minimizing bias.

Randomization: Assigns each individual an equal chance of selection, reducing systematic differences.
Representative sample: A sample whose characteristics closely match those of the population.
Bias: Systematic deviation from the true population parameter due to non-representative sampling.

It Is the Sample Size That Matters

The reliability of a sample depends on its size, not the size of the population or the fraction sampled. Larger samples tend to yield more reliable estimates, but only if the sample is representative.

Sample size: The number of individuals in the sample. Larger sizes reduce sampling variability.
Sampling variability: The natural variation in sample statistics from sample to sample.
Key point: A large but biased sample is still unreliable.

How to Sample?

Key Definitions

Sampling frame: The list of individuals from which the sample is drawn. Must accurately reflect the population.
Sampling variability: Differences in sample statistics due to random selection. Larger samples reduce this variability.

Example: Drawing two different samples from the same population will likely yield different results due to sampling variability.

Sampling Methods

Simple Random Sampling (SRS): Each individual and each possible sample of size has an equal chance of being selected.
Stratified Sampling: The population is divided into strata (groups sharing a characteristic), and SRS is performed within each stratum. Results are combined for analysis.
Proportional Allocation: The size of each SRS is proportional to the size of the stratum in the population.
Cluster Sampling: The population is divided into clusters (natural groupings). A random sample of clusters is selected, and all individuals in chosen clusters are sampled (one-stage), or a further SRS is performed within clusters (two-stage).
Multistage Sampling: Combines multiple sampling methods or stages, such as two-stage cluster sampling.
Systematic Sampling: Selects every th individual from the sampling frame. Effective if the list has no hidden order.

Sampling Methods Comparison Table

Method	Description	Advantages	Disadvantages
Simple Random Sampling (SRS)	Randomly select individuals; each has equal chance	Unbiased, easy to analyze	May be impractical for large populations
Stratified Sampling	Divide into strata, sample within each	Reduces variability, ensures representation	Requires knowledge of strata
Cluster Sampling	Divide into clusters, sample clusters	Cost-efficient, practical	May increase variability if clusters are heterogeneous
Systematic Sampling	Select every th individual	Simple, quick	Risk of bias if list is ordered
Multistage Sampling	Combine multiple methods/stages	Flexible, practical for large populations	Complex to design and analyze

Bad Sampling Procedures, Biases, and More

Sampling must be carefully designed to avoid bias. Common sources of bias include:

Undercoverage: Some groups are excluded or underrepresented in the sampling frame.
Convenience Sampling: Individuals are selected based on ease of access, not randomness.
Voluntary Response Bias: Individuals with strong opinions are more likely to participate, skewing results.
Nonresponse Bias: Those who do not respond may differ systematically from respondents.
Response Bias: Survey responses are influenced by question wording, misunderstanding, or reluctance to answer truthfully.

Types of Bias Table

Type of Bias	Description	Example
Undercoverage	Excludes certain groups from sampling frame	Surveying only library visitors to estimate student library use
Convenience Sampling	Samples based on accessibility	Surveying neighbors for housing prices
Voluntary Response Bias	Participants self-select, often with strong opinions	Call-in polls
Nonresponse Bias	Non-respondents differ from respondents	Mail-in questionnaires
Response Bias	Responses influenced by question phrasing or reluctance	Surveying about sensitive behaviors (e.g., impaired driving)

Key Formulas and Concepts

Sample Mean:
Population Mean:
Sampling Variability:

Additional info: These notes expand on the definitions and examples provided in the original slides, adding context for bias types and sampling methods, and including key formulas for statistical estimation.