Skip to main content
Back

Chapter 1: Data Collection – Foundations of Statistical Studies

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Chapter 1: Data Collection

1.1 What is Statistics?

Statistics is the science of collecting, organizing, summarizing, and analyzing data to draw conclusions and answer questions. It also involves providing a measure of confidence in any conclusions drawn. In statistics, results are often expressed with a degree of certainty, such as "95% confidence." The field is essential for making informed decisions in various disciplines.

  • Key Point 1: Statistics involves both descriptive and inferential methods to analyze data and make predictions or generalizations about a population.

  • Key Point 2: Statistical studies typically begin with a question and use data to answer it, often by examining a sample rather than the entire population.

  • Example: To estimate the average age of Santa Monica College (SMC) students, one could survey a subset (sample) of students rather than the entire student body (population).

1.1.1 Definitions in Statistics

  • Population: The complete collection of all elements (individuals or objects) to be studied.

  • Sample: A subset of the population selected for actual study.

  • Parameter: A numerical value describing a characteristic of the population.

  • Statistic: A numerical value describing a characteristic of a sample (used to estimate a parameter).

  • Census: A study that collects data from every member of the population.

  • Individual/Element: Each person or object in the study.

  • Variable: A characteristic or property measured or observed for each individual in the study.

Additional info: The value of a statistic can vary from sample to sample, but the parameter is fixed for a given population.

1.1.2 Types of Variables

  • Qualitative Variable: Describes an attribute or category (e.g., gender, major).

  • Quantitative Variable: Represents a measurable quantity (e.g., age, number of siblings).

Example: In a survey of high school seniors, 'proposed major' is qualitative, while 'age in years' and 'number of siblings' are quantitative.

1.1.3 Types of Quantitative Variables

  • Discrete Variable: Takes on countable values (e.g., number of books).

  • Continuous Variable: Can take on any value within a range (e.g., weight, height).

1.1.4 Levels of Measurement

  • Nominal: Categories with no inherent order (e.g., favorite color).

  • Ordinal: Categories with a meaningful order but not equal intervals (e.g., course grades).

  • Interval: Ordered, equal intervals, but no true zero (e.g., temperature in Celsius).

  • Ratio: Ordered, equal intervals, and a true zero (e.g., income, weight).

Level

Order

Equal Intervals

True Zero

Example

Nominal

No

No

No

Gender

Ordinal

Yes

No

No

Course Grades

Interval

Yes

Yes

No

Year, Temperature (°C)

Ratio

Yes

Yes

Yes

Weight, Income

Section 1.2: Types of Statistical Studies

1.2.1 Key Definitions

  • Response Variable: The main variable of interest in a study (the outcome being measured).

  • Observational Study: The researcher observes and measures variables without influencing them.

  • Explanatory (Exploratory) Variable: A variable that may explain or influence changes in the response variable.

  • Designed Study (Experiment): The researcher assigns treatments to groups and observes the effects on the response variable.

Example: Measuring blood pressure in women aged 50-59 can be observational (just recording values) or experimental (assigning exercise levels and measuring effects).

1.2.2 Confounding and Lurking Variables

  • Confounding: Occurs when the effects of multiple explanatory variables cannot be separated.

  • Lurking Variable: An unmeasured variable that may influence both the explanatory and response variables.

Example: In a soda price experiment, the day of the week could be a lurking variable affecting sales.

Additional info: Observational studies can show associations but not causation.

Section 1.3: Sampling Methods

1.3.1 Importance of Sampling

Choosing a representative sample is crucial for valid results. A poor sample can lead to misleading conclusions.

1.3.2 Simple Random Sampling

  • Every possible sample of size n from a population of size N has an equal chance of being selected.

  • Methods include drawing names from a hat or using random number generators.

Example: Selecting 3 committee members from 12 volunteers by random draw.

1.3.3 Other Sampling Methods

  • Systematic Sampling: Select every kth individual from a numbered list.

  • Stratified Sampling: Divide the population into subgroups (strata) and randomly sample from each.

  • Cluster Sampling: Divide the population into clusters, randomly select clusters, and include all individuals from chosen clusters.

  • Convenience Sampling: Use individuals who are easiest to reach (not statistically sound).

  • Voluntary (Self-Selected) Sample: Individuals choose to participate, often leading to bias.

Sampling Method

Description

Example

Simple Random

Every group equally likely

Names from a hat

Systematic

Every k-th individual

Every 10th person on a list

Stratified

Random sample from each subgroup

Sample from each grade level

Cluster

Randomly select clusters, sample all in cluster

Randomly select classrooms

Convenience

Whoever is easiest to reach

Students in a classroom

Section 1.5: Bias in Sampling

1.5.1 Types of Bias

  • Sampling Bias: The sampling method favors certain groups over others.

  • Non-response Bias: Individuals selected do not respond, and their views may differ from respondents.

  • Response Bias: Survey answers do not reflect true opinions due to interviewer error, misrepresented answers, question wording, order, or data entry errors.

1.5.2 Types of Error

  • Non-sampling Error: Errors due to bias in the study design or data collection.

  • Sampling Error: Natural variation because a sample, not the whole population, is used.

Additional info: Reducing bias is essential for reliable statistical inference.

Section 1.6: Experimental Design

1.6.1 Experiments and Treatments

An experiment is a controlled study to determine the effect of varying one or more explanatory variables (factors) on a response variable. Each combination of factors is called a treatment. A control group is necessary for comparison.

  • Placebo: An inactive treatment used to control for psychological effects.

  • Single Blind Experiment: Subjects do not know which treatment they receive, but researchers do.

  • Double Blind Experiment: Neither subjects nor researchers know who receives which treatment.

Example: Comparing two new headache medications with a control group receiving the standard medication.

Additional info: Proper experimental design includes randomization, control, and replication to ensure valid results.

Pearson Logo

Study Prep