BackChapter 1: Data Collection – Foundations of Statistics
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Chapter 1: Data Collection
Introduction to Statistics
Statistics is the science of collecting, organizing, summarizing, and analyzing data. It provides methods for making decisions and inferences about populations based on sample data.
Data: Information collected from observations, measurements, or responses.
Population: The entire group of individuals or items of interest.
Sample: A subset of the population selected for study.
Individual: A single member of the population or sample.
Example: If the population is all UCLA transfer students, a sample could be SMC Math Department students who transferred to UCLA. An individual is one student enrolled in a math class.
Parameters and Statistics
Numerical summaries are used to describe populations and samples:
Parameter: A numerical summary of a population (e.g., mean GPA of all transferred students).
Statistic: A numerical summary of a sample (e.g., mean GPA of SMC Math Dept. students who transferred).
Types of Statistics
Descriptive Statistics: Methods for organizing and summarizing data, such as graphs and tables.
Inferential Statistics: Methods for making generalizations or predictions about a population based on sample data.
Types of Data
Qualitative (Categorical) Data: Consists of names, labels, or categories (e.g., student ID numbers, which cannot be meaningfully added or averaged).
Quantitative Data: Consists of numerical values representing counts or measurements.
Types of Quantitative Data
Discrete Data: Countable values (e.g., number of sand pebbles on a beach).
Continuous Data: Uncountable values, can take any value within a range (e.g., height between 2 and 3 feet).
Levels of Measurement
Data can be classified by the following levels of measurement:
Nominal: Categories with no inherent order (e.g., business product tools).
Ordinal: Categories with a meaningful order but no consistent difference between ranks (e.g., ranking graduate schools).
Interval: Ordered categories with meaningful differences, but no true zero (e.g., years in American history; 1900–1950, 1951–2000).
Ratio: Like interval, but with a true zero, allowing for ratios (e.g., height, weight, age).
Methods of Data Collection
Observational Study: Observes individuals and measures variables without influencing them.
Experiment: Applies a treatment to individuals and observes the effect.
Census: Collects data from every individual in the population.
Sampling Methods
Sampling methods are used to select a subset of the population for study:
Simple Random Sample (SRS): Every individual has an equal chance of being selected. Can be with or without replacement.
Stratified Sampling: Population is divided into groups (strata), and a random sample is taken from each group.
Cluster Sampling: Population is divided into clusters, some clusters are randomly selected, and all individuals in chosen clusters are surveyed.
Systematic Sampling: Every kth individual is selected from a list (e.g., every other guest).
Convenience Sampling: Sample is chosen based on ease of access (e.g., surveying TV/radio station audience).
Voluntary Response Sampling: Individuals choose to participate (often leads to bias).
Bias in Sampling
Bias occurs when a sample does not accurately represent the population.
Sampling Bias: Certain groups are favored over others.
Undercoverage: Some members of the population are inadequately represented (e.g., only surveying honors students).
Nonresponse Bias: Individuals selected do not respond.
Response Bias: Survey responses are inaccurate due to question wording or interviewer influence.
Data Entry Error: Mistakes in recording or entering data.
Note: Nonresponse bias, response bias, and data entry error are types of non-sampling error. Sampling error results from using a sample to estimate a population parameter.
Designing Experiments
Experiment: A controlled study to determine the effect of explanatory variables (factors) on a response variable.
Treatment: The condition applied to subjects (e.g., medicine to lower blood pressure).
Placebo: An inactive treatment used as a control.
Single-Blind Experiment: Subjects do not know which treatment they receive, but administrators do.
Double-Blind Experiment: Neither subjects nor administrators know who receives which treatment.
Confounding Variables: Explanatory variables not considered in the study that may affect the outcome.
Rounding and Decimal Rules
When converting percentages to decimals, divide by 100 (e.g., 73% of 125: ).
Follow rounding rules: If the next decimal place is 5 or greater, round up.
For sample sizes, always round up to the next whole number.
Summary Table: Sampling Methods Comparison
Sampling Method | Description | Example |
|---|---|---|
Simple Random Sample | Every individual has equal chance of selection | Randomly select 30 students from a list of 500 |
Stratified Sample | Divide population into groups, sample from each | Sample students from each grade level |
Cluster Sample | Divide into clusters, randomly select clusters, survey all in selected clusters | Survey all students in randomly chosen classes |
Systematic Sample | Select every kth individual | Survey every 5th person entering a building |
Convenience Sample | Sample easiest to reach individuals | Survey people in a nearby café |
Voluntary Response Sample | Individuals choose to participate | Online poll open to all website visitors |
Key Formulas
Sample Proportion:
Percent to Decimal:
Sample Size Calculation:
Additional info:
When designing surveys or experiments, always consider potential sources of bias and error.
Proper randomization and control are essential for valid statistical inference.