BackSample Surveys and Sampling Methods in Statistics
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Sample Surveys
Why Do We Need Sampling?
In statistics, the population refers to the complete collection of individuals under study. While a census aims to gather data from every member of the population, it is often costly, time-consuming, or impractical. Instead, researchers use a sample, a subset of individuals selected from the population, to make inferences about the whole group.
Population: The entire group of interest in a study.
Census: Data collection from every member of the population.
Sample: A subset of the population, ideally representative of the whole.
Bias: Systematic deviation from the true population characteristics due to nonrepresentative sampling.
A parameter is a numerical summary of a population (e.g., mean, proportion), while a statistic is the corresponding summary calculated from a sample. Statistics are used to estimate population parameters.
Parameter: Numerical summary of a population (e.g., population mean , proportion , standard deviation ).
Statistic: Numerical summary of a sample (e.g., sample mean , sample proportion , sample standard deviation ).
Example
Suppose we want to estimate the average tuition fees paid by UBC students. Interviewing 500 randomly selected students provides a sample. The population is all UBC students, the sample is the 500 students, the parameter is the mean tuition fee for all UBC students, and the statistic is the mean tuition fee for the 500 sampled students.
Randomization: The Key to a Representative Sample
Randomization is essential for obtaining a representative sample. It ensures that every individual has an equal chance of being selected, minimizing the risk of bias and making the sample characteristics comparable to the population.
Randomization: The process of selecting individuals by chance, reducing bias.
Inference: Drawing conclusions about a population based on sample data.
Sample Size Considerations
The reliability of a sample depends on its size and representativeness. The actual number of individuals in the sample is crucial, not the size of the population. A large, random sample reduces sampling variability and increases reliability.
Sampling Variability: Differences in sample characteristics from one sample to another.
Representativeness: The degree to which a sample reflects the population.
How to Sample?
Key Definitions
Sampling Frame: The list of individuals from which the sample is drawn (e.g., a roster of students).
Sampling Variability: The difference in characteristics from sample to sample. Larger sample sizes reduce variability.
Sampling Methods
There are several methods for selecting samples, each with its own advantages and limitations.
1. Simple Random Sampling (SRS)
In SRS, n individuals are chosen at random from the population. Each individual and each possible sample of size n are equally likely to be selected.
Definition: Every member of the population has an equal chance of being selected.
Example: Drawing names from a hat.
2. Stratified Sampling
The population is divided into strata (groups sharing a characteristic), and a simple random sample is drawn from each stratum. This method reduces sampling variability and increases reliability.
Stratum: A subgroup of the population with a shared characteristic.
Proportional Allocation: The sample size from each stratum is proportional to its size in the population.
Example: Sampling students from each faculty at a university.
3. Cluster Sampling
When natural groupings exist, the population is divided into clusters. A random sample of clusters is selected, and all individuals within those clusters are included (one-stage), or a random sample within clusters is taken (two-stage).
Cluster: A natural grouping within the population (e.g., schools, neighborhoods).
One-stage Cluster Sample: All individuals in selected clusters are sampled.
Two-stage Cluster Sample: A random sample of individuals within selected clusters is taken.
Example: Surveying all students in randomly selected schools.
4. Multistage Sampling
Multistage sampling involves more than one stage or sampling procedure, such as combining cluster and random sampling.
Example: Two-stage cluster sampling is a type of multistage sampling.
5. Systematic Sampling
Systematic sampling selects every kth individual from the sampling frame. This method is valid if there is no hidden order in the list.
Definition: Select every kth individual from a list.
Example: Choosing every 10th student from a roster.
Sampling Biases and Problems
Common Sources of Bias
Undercoverage: When the sampling frame excludes certain individuals, leading to a biased sample. Example: Surveying only students in one library to estimate library usage.
Convenience Sampling: Selecting individuals based on easy availability, which may not be representative. Example: Surveying neighbors to estimate housing prices.
Nonresponse Bias: When individuals who do not respond differ from those who do, leading to bias. Example: Mail-in questionnaires with low response rates.
Voluntary Response Bias: Individuals with strong opinions are more likely to respond, skewing results. Example: Call-in polls.
Response Bias: When survey responses are influenced by question wording or reluctance to answer truthfully. Example: Sensitive questions about illegal behavior.
Summary Table: Sampling Methods
Method | Description | Advantages | Disadvantages |
|---|---|---|---|
Simple Random Sampling | Randomly select individuals; each has equal chance | Unbiased, easy to analyze | May be impractical for large populations |
Stratified Sampling | Divide into strata, sample from each | Reduces variability, ensures representation | Requires knowledge of strata |
Cluster Sampling | Randomly select clusters, sample all or some within | Cost-effective, practical | Higher variability if clusters differ |
Multistage Sampling | Combine multiple sampling methods | Flexible, practical for large populations | Complex design and analysis |
Systematic Sampling | Select every k-th individual | Simple, quick | Risk of hidden order bias |
Key Formulas
Population Mean:
Sample Mean:
Population Proportion:
Sample Proportion:
Conclusion
Effective sampling is crucial for reliable statistical inference. Understanding sampling methods and potential biases helps ensure that sample statistics provide accurate estimates of population parameters.