BackSampling Methods and Bias in Statistics: Study Guide
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Sampling in Statistics
What is a Random Sample?
A random sample is a fundamental concept in statistics, ensuring that every individual in a population has an equal chance of being selected. This is crucial for obtaining representative data and minimizing bias.
Definition: A sample is random if every individual in the population has an equal chance of being selected.
Importance: Random sampling prevents bias and makes the sample representative of the population.
Example: To study student opinions about cafeteria food, pick names from the entire student list using a random number generator. NOT a random sample: Asking only people who eat at the cafeteria.
Sampling Methods
1. Simple Random Sample (SRS)
Simple Random Sampling is the most basic form of random sampling, where every possible group of individuals has the same chance of being chosen.
Definition: Every group of n individuals has the same chance of being chosen.
Example: A teacher puts all 25 student names in a bag, shakes it, and pulls out 5 names.
Other methods: Use a random number generator or a random table to select individuals.
2. Stratified Sampling
Stratified sampling involves dividing the population into subgroups (strata) based on shared characteristics, then randomly sampling from each stratum. This ensures representation from all important subgroups.
Definition: Population is divided into strata (e.g., by major, age, gender), and a random sample is taken from each stratum.
When to use: When groups differ in important ways, but all should be represented.
Example: College students grouped by major. Randomly pick 10 students from Science, 10 from Business, 10 from Arts.
Benefit: Ensures all major groups are represented.
3. Cluster Sampling
Cluster sampling is used when the population is spread out. The population is divided into clusters, some clusters are randomly chosen, and all individuals in those clusters are included in the sample.
Definition: Population is divided into clusters (groups), then some clusters are randomly chosen. Everyone in those clusters is included.
When to use: When the population is too spread out to sample individuals easily.
Example: A college has 50 classes. Randomly select 5 classes and survey all students in those classes.
Benefit: Faster and cheaper than stratified sampling.
4. Systematic Sampling
Systematic sampling selects every kth individual from a list after a random starting point. It is easy to implement but requires that the list has no hidden pattern.
Definition: Select every kth individual from a list after a random starting point.
Example: A list of 500 customers is ordered randomly. Start at the 3rd person, then choose every 10th (3rd, 13th, 23rd, ...).
Benefit: Easy to implement, but must ensure the list has no hidden pattern.
Comparison: Stratified vs. Cluster Sampling
Both methods divide the population into groups, but differ in how samples are selected:
Stratified Sampling: Randomly sample individuals from every group (stratum).
Cluster Sampling: Randomly select entire groups (clusters), then sample all individuals within those groups.
Non-Random Samples & Bias
Non-random sampling methods can introduce bias, making the sample unrepresentative of the population.
Voluntary Response Bias: People choose themselves to respond, often those with strong opinions. Example: Online product reviews—usually only very happy or very angry customers respond.
Convenience Sampling: Choose people who are easy to reach. Example: Surveying only the first row of students in a class.
Response Bias: People give inaccurate answers, often due to embarrassment or social desirability. Example: Underreporting drinking habits.
Nonresponse Bias: Some people do not respond, and those who do may differ from those who don’t. Example: Mail survey about income—those with very high or low income may not respond.
Undercoverage: Some groups are missed in the sample. Example: Telephone survey using landlines misses younger people who only use cell phones.
Sampling Frame
The sampling frame is the list from which the sample is actually chosen. It is important that the sampling frame closely matches the population of interest.
Definition: The list from which the sample is actually chosen.
Example: If the population is all registered voters, the sampling frame might be the list of people who voted in the last election.
Retrospective vs. Prospective Studies
These terms describe the timing of data collection in observational studies.
Retrospective Study: Looks at past data. Example: Checking high school GPA of today’s business owners.
Prospective Study: Collects data into the future. Example: Following a group of freshmen through 4 years of college.
Observational vs. Experimental Studies
Observational Study: Observe without interference. Example: Record how many hours students study per week.
Experimental Study: (Not detailed in the notes, but for completeness) Researchers actively impose a treatment or intervention and observe the outcome.
Examples: Identifying Sampling Methods
Below are examples of how to identify different sampling methods in practice:
Stratified Sampling: A sample of 100 undergraduates is taken by organizing students by classification (freshman, sophomore, junior, senior), then selecting 25 from each group.
Systematic Sampling: A random number generator is used to select a student from an alphabetical list, then every 50th student is included in the sample.
Simple Random Sample: A completely random method is used to select 75 students, with each student having the same probability of being chosen at any stage.
Convenience Sampling: An administrative assistant stands in front of the library and asks the first 100 students encountered.
Summary Table: Sampling Methods
Sampling Method | How It Works | When to Use | Example |
|---|---|---|---|
Simple Random Sample (SRS) | Every group of n individuals has equal chance | General use, when no subgroups need special representation | Randomly select names from a list |
Stratified Sampling | Divide into strata, sample from each | When subgroups differ and all must be represented | Sample by major, age, or class year |
Cluster Sampling | Divide into clusters, randomly select clusters, sample all in clusters | When population is spread out geographically | Sample all students in selected classes |
Systematic Sampling | Select every k-th individual after random start | When a list is available and randomization is needed | Every 10th customer on a list |
Convenience Sampling | Sample those easiest to reach | Quick, but not representative | First 100 people at the library |
Voluntary Response | People choose to participate | When strong opinions are likely | Online reviews |