Chapter 1: Data Collection – Foundations of Statistical Studies

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Chapter 1: Data Collection

1.1 What is Statistics?

Statistics is the science of collecting, organizing, summarizing, and analyzing data to draw conclusions and answer questions. It also involves providing a measure of confidence in any conclusions drawn. In statistics, results are often expressed with a degree of certainty, such as "95% confidence." The field is essential for making informed decisions in various disciplines.

Key Point 1: Statistics involves both descriptive and inferential methods to analyze data and make predictions or generalizations about a population.
Key Point 2: Statistical studies typically begin with a question and use data to answer it, often by examining a sample rather than the entire population.
Example: To estimate the average age of Santa Monica College (SMC) students, one could survey a subset (sample) of students rather than the entire student body (population).

1.1.1 Definitions in Statistics

Population: The complete collection of all elements (individuals or objects) to be studied.
Sample: A subset of the population selected for actual study.
Parameter: A numerical value describing a characteristic of the population.
Statistic: A numerical value describing a characteristic of a sample (used to estimate a parameter).
Census: A study that collects data from every member of the population.
Individual/Element: Each person or object in the study.
Variable: A characteristic or property measured or observed for each individual in the study.

Additional info: The value of a statistic can vary from sample to sample, but the parameter is fixed for a given population.

1.1.2 Types of Variables

Qualitative Variable: Describes an attribute or category (e.g., gender, major).
Quantitative Variable: Represents a measurable quantity (e.g., age, number of siblings).

Example: In a survey of high school seniors, 'proposed major' is qualitative, while 'age in years' and 'number of siblings' are quantitative.

1.1.3 Types of Quantitative Variables

Discrete Variable: Takes on countable values (e.g., number of books).
Continuous Variable: Can take on any value within a range (e.g., weight, height).

1.1.4 Levels of Measurement

Nominal: Categories with no inherent order (e.g., favorite color).
Ordinal: Categories with a meaningful order but not equal intervals (e.g., course grades).
Interval: Ordered, equal intervals, but no true zero (e.g., temperature in Celsius).
Ratio: Ordered, equal intervals, and a true zero (e.g., income, weight).

Level	Order	Equal Intervals	True Zero	Example
Nominal	No	No	No	Gender
Ordinal	Yes	No	No	Course Grades
Interval	Yes	Yes	No	Year, Temperature (°C)
Ratio	Yes	Yes	Yes	Weight, Income

Section 1.2: Types of Statistical Studies

1.2.1 Key Definitions

Response Variable: The main variable of interest in a study (the outcome being measured).
Observational Study: The researcher observes and measures variables without influencing them.
Explanatory (Exploratory) Variable: A variable that may explain or influence changes in the response variable.
Designed Study (Experiment): The researcher assigns treatments to groups and observes the effects on the response variable.

Example: Measuring blood pressure in women aged 50-59 can be observational (just recording values) or experimental (assigning exercise levels and measuring effects).

1.2.2 Confounding and Lurking Variables

Confounding: Occurs when the effects of multiple explanatory variables cannot be separated.
Lurking Variable: An unmeasured variable that may influence both the explanatory and response variables.

Example: In a soda price experiment, the day of the week could be a lurking variable affecting sales.

Additional info: Observational studies can show associations but not causation.

Section 1.3: Sampling Methods

1.3.1 Importance of Sampling

Choosing a representative sample is crucial for valid results. A poor sample can lead to misleading conclusions.

1.3.2 Simple Random Sampling

Every possible sample of size n from a population of size N has an equal chance of being selected.
Methods include drawing names from a hat or using random number generators.

Example: Selecting 3 committee members from 12 volunteers by random draw.

1.3.3 Other Sampling Methods

Systematic Sampling: Select every kth individual from a numbered list.
Stratified Sampling: Divide the population into subgroups (strata) and randomly sample from each.
Cluster Sampling: Divide the population into clusters, randomly select clusters, and include all individuals from chosen clusters.
Convenience Sampling: Use individuals who are easiest to reach (not statistically sound).
Voluntary (Self-Selected) Sample: Individuals choose to participate, often leading to bias.

Sampling Method	Description	Example
Simple Random	Every group equally likely	Names from a hat
Systematic	Every k-th individual	Every 10th person on a list
Stratified	Random sample from each subgroup	Sample from each grade level
Cluster	Randomly select clusters, sample all in cluster	Randomly select classrooms
Convenience	Whoever is easiest to reach	Students in a classroom

Section 1.5: Bias in Sampling

1.5.1 Types of Bias

Sampling Bias: The sampling method favors certain groups over others.
Non-response Bias: Individuals selected do not respond, and their views may differ from respondents.
Response Bias: Survey answers do not reflect true opinions due to interviewer error, misrepresented answers, question wording, order, or data entry errors.

1.5.2 Types of Error

Non-sampling Error: Errors due to bias in the study design or data collection.
Sampling Error: Natural variation because a sample, not the whole population, is used.

Additional info: Reducing bias is essential for reliable statistical inference.

Section 1.6: Experimental Design

1.6.1 Experiments and Treatments

An experiment is a controlled study to determine the effect of varying one or more explanatory variables (factors) on a response variable. Each combination of factors is called a treatment. A control group is necessary for comparison.

Placebo: An inactive treatment used to control for psychological effects.
Single Blind Experiment: Subjects do not know which treatment they receive, but researchers do.
Double Blind Experiment: Neither subjects nor researchers know who receives which treatment.

Example: Comparing two new headache medications with a control group receiving the standard medication.

Additional info: Proper experimental design includes randomization, control, and replication to ensure valid results.