BackChapter 1: Data Collection – Foundations of Statistical Studies
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Chapter 1: Data Collection
1.1 What is Statistics?
Statistics is the science of collecting, organizing, summarizing, and analyzing data to draw conclusions and answer questions. It also involves providing a measure of confidence in any conclusions drawn. In statistics, results are often expressed with a degree of certainty, such as "95% confidence." The field is essential for making informed decisions in various disciplines.
Key Point 1: Statistics involves both descriptive and inferential methods to analyze data and make predictions or generalizations about a population.
Key Point 2: Statistical studies typically begin with a question and use data to answer it, often by examining a sample rather than the entire population.
Example: To estimate the average age of Santa Monica College (SMC) students, one could survey a subset (sample) of students rather than the entire student body (population).
1.1.1 Definitions in Statistics
Population: The complete collection of all elements (individuals or objects) to be studied.
Sample: A subset of the population selected for actual study.
Parameter: A numerical value describing a characteristic of the population.
Statistic: A numerical value describing a characteristic of a sample (used to estimate a parameter).
Census: A study that collects data from every member of the population.
Individual/Element: Each person or object in the study.
Variable: A characteristic or property measured or observed for each individual in the study.
Additional info: The value of a statistic can vary from sample to sample, but the parameter is fixed for a given population.
1.1.2 Types of Variables
Qualitative Variable: Describes an attribute or category (e.g., gender, major).
Quantitative Variable: Represents a measurable quantity (e.g., age, number of siblings).
Example: In a survey of high school seniors, 'proposed major' is qualitative, while 'age in years' and 'number of siblings' are quantitative.
1.1.3 Types of Quantitative Variables
Discrete Variable: Takes on countable values (e.g., number of books).
Continuous Variable: Can take on any value within a range (e.g., weight, height).
1.1.4 Levels of Measurement
Nominal: Categories with no inherent order (e.g., favorite color).
Ordinal: Categories with a meaningful order but not equal intervals (e.g., course grades).
Interval: Ordered, equal intervals, but no true zero (e.g., temperature in Celsius).
Ratio: Ordered, equal intervals, and a true zero (e.g., income, weight).
Level | Order | Equal Intervals | True Zero | Example |
|---|---|---|---|---|
Nominal | No | No | No | Gender |
Ordinal | Yes | No | No | Course Grades |
Interval | Yes | Yes | No | Year, Temperature (°C) |
Ratio | Yes | Yes | Yes | Weight, Income |
Section 1.2: Types of Statistical Studies
1.2.1 Key Definitions
Response Variable: The main variable of interest in a study (the outcome being measured).
Observational Study: The researcher observes and measures variables without influencing them.
Explanatory (Exploratory) Variable: A variable that may explain or influence changes in the response variable.
Designed Study (Experiment): The researcher assigns treatments to groups and observes the effects on the response variable.
Example: Measuring blood pressure in women aged 50-59 can be observational (just recording values) or experimental (assigning exercise levels and measuring effects).
1.2.2 Confounding and Lurking Variables
Confounding: Occurs when the effects of multiple explanatory variables cannot be separated.
Lurking Variable: An unmeasured variable that may influence both the explanatory and response variables.
Example: In a soda price experiment, the day of the week could be a lurking variable affecting sales.
Additional info: Observational studies can show associations but not causation.
Section 1.3: Sampling Methods
1.3.1 Importance of Sampling
Choosing a representative sample is crucial for valid results. A poor sample can lead to misleading conclusions.
1.3.2 Simple Random Sampling
Every possible sample of size n from a population of size N has an equal chance of being selected.
Methods include drawing names from a hat or using random number generators.
Example: Selecting 3 committee members from 12 volunteers by random draw.
1.3.3 Other Sampling Methods
Systematic Sampling: Select every kth individual from a numbered list.
Stratified Sampling: Divide the population into subgroups (strata) and randomly sample from each.
Cluster Sampling: Divide the population into clusters, randomly select clusters, and include all individuals from chosen clusters.
Convenience Sampling: Use individuals who are easiest to reach (not statistically sound).
Voluntary (Self-Selected) Sample: Individuals choose to participate, often leading to bias.
Sampling Method | Description | Example |
|---|---|---|
Simple Random | Every group equally likely | Names from a hat |
Systematic | Every k-th individual | Every 10th person on a list |
Stratified | Random sample from each subgroup | Sample from each grade level |
Cluster | Randomly select clusters, sample all in cluster | Randomly select classrooms |
Convenience | Whoever is easiest to reach | Students in a classroom |
Section 1.5: Bias in Sampling
1.5.1 Types of Bias
Sampling Bias: The sampling method favors certain groups over others.
Non-response Bias: Individuals selected do not respond, and their views may differ from respondents.
Response Bias: Survey answers do not reflect true opinions due to interviewer error, misrepresented answers, question wording, order, or data entry errors.
1.5.2 Types of Error
Non-sampling Error: Errors due to bias in the study design or data collection.
Sampling Error: Natural variation because a sample, not the whole population, is used.
Additional info: Reducing bias is essential for reliable statistical inference.
Section 1.6: Experimental Design
1.6.1 Experiments and Treatments
An experiment is a controlled study to determine the effect of varying one or more explanatory variables (factors) on a response variable. Each combination of factors is called a treatment. A control group is necessary for comparison.
Placebo: An inactive treatment used to control for psychological effects.
Single Blind Experiment: Subjects do not know which treatment they receive, but researchers do.
Double Blind Experiment: Neither subjects nor researchers know who receives which treatment.
Example: Comparing two new headache medications with a control group receiving the standard medication.
Additional info: Proper experimental design includes randomization, control, and replication to ensure valid results.