BackIntroduction to Statistics: Data, Variables, and the Five W’s
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Section 1.2: Data and the Five W’s and H
Understanding Data Collection
To analyze any data set, it is crucial to identify the context in which the data were collected. This is summarized by the Five W’s and one H:
Who: The subjects or cases being studied.
What: The variables measured or recorded.
When: The time period during which the data were collected.
Where: The location or setting of the study.
Why: The purpose or motivation for collecting the data.
How: The method or process used to collect the data.

Example: In a study on muscle hypertrophy, researchers collected data on young men performing resistance training to determine the effects of lifting lighter versus heavier weights.


Components of a Data Set
A data set consists of cases (rows) and variables (columns). Each case provides a set of measurements or responses for the variables of interest.
Cases: The individual units or subjects from which data are collected.
Variables: The characteristics or attributes measured for each case.
Section 1.3: Variables and Their Types
Identifying Variables
Variables are characteristics or attributes that can take on different values among cases in a data set. Understanding the type of variable is essential for choosing appropriate statistical methods.
Variability: The degree to which data values differ among cases.
Examples of Variables: Size, fur color, weight, eye color, height, breed, collar presence, age, tongue position, ear length, paw size, fur length, name, fastest speed, number of unique fur colors.

Types of Variables
Categorical Variables (Qualitative): Place cases into groups or categories. Responses are words or labels. Can be nominal (no order) or ordinal (ordered categories).
Quantitative Variables (Numerical): Measured or recorded as numbers. Calculations with these numbers make sense. Can be discrete (countable) or continuous (measurable).

Examples:
Categorical: Size (Small, Medium, Large), Fur Color, Eye Color, Breed, Collar (yes/no), Name, Tongue (out/not), Inversion (yes/no).
Quantitative: Weight (pounds), Height (inches), Age (years), Ear Length (inches), Fur Length (inches), Fastest Speed (miles per hour), Number of unique colors.
Practice: Identifying Variable Types in a Data Set
Given a roller coaster data set with variables such as Name, Park, Type, Duration, Speed, Height, Drop, Length, Inversion, and Number of Inversions, students are asked to classify variables as categorical or quantitative.
Example Answer: 3 categorical variables (Name, Park, Type), 7 quantitative variables (Duration, Speed, Height, Drop, Length, Inversion, Number of Inversions).
Visualizing Data: Graphs and Tables
Different types of graphs are used to display categorical and quantitative data:
Bar Charts: Used for categorical variables to show counts or frequencies.
Scatterplots: Used for quantitative variables to show relationships between two numerical variables.
Boxplots: Used to display the distribution of a quantitative variable for different groups.



Tables: Describing and Summarizing Data
Tables are used to summarize and compare data across groups or categories. For example, a table of baseline characteristics can show the effectiveness of random assignment in an experiment.
Variable | HR (n = 24) | LR (n = 25) | P |
|---|---|---|---|
Age, yr | 23 ± 2 | 23 ± 2 | 0.73 |
Training age, yr | 4.2 ± 2.0 | 4.6 ± 1.8 | 0.54 |
Total body mass, kg | 88 ± 4 | 88 ± 4 | 0.81 |
Height, m | 1.80 ± 0.1 | 1.80 ± 0.1 | 0.81 |
BMI, kg/m2 | 26.8 ± 2.1 | 26.8 ± 2.1 | 0.99 |
Lean mass, kg | 67.6 ± 7.2 | 67.9 ± 7.1 | 0.99 |
Total fat mass, kg | 14.9 ± 2.4 | 14.8 ± 2.4 | 0.97 |
Leg press 1RM, kg | 357 ± 25 | 351 ± 23 | 0.87 |
Bench press 1RM, kg | 96 ± 13 | 92 ± 14 | 0.41 |
Shoulder press 1RM, kg | 91 ± 5 | 92 ± 4 | 0.87 |

Special Cases: Recoding Variables
Sometimes, quantitative variables are recoded into categorical variables for analysis. For example, the year a roller coaster opened can be grouped into decades.
Lower End | Upper End | Recoded Value | Number of Rows |
|---|---|---|---|
1970 | 1979 | 1970's | 15 |
1980 | 1989 | 1980's | 12 |
1990 | 1999 | 1990's | 70 |
2000 | 2009 | 2000's | 45 |
2010 | 2019 | 2010's | 16 |
2020 | 2023 | 2020's | 7 |


Special Variable Types: Ordinal and Identifier Variables
Ordinal Variables: Categorical variables with a meaningful order (e.g., Likert scale ratings).
Identifier Variables: Unique labels for cases (e.g., ZIP codes), which are categorical but do not have a meaningful order or quantitative interpretation.
Example: ZIP codes are categorical variables used as identifiers, not as ordinal or quantitative variables.
Summary
Statistics is the science of learning from data and making decisions under uncertainty.
Understanding the context of data collection (the Five W’s and H) is essential for proper analysis.
Variables can be categorical or quantitative, and their correct identification is crucial for analysis.
Tables and graphs are fundamental tools for summarizing and visualizing data.
Special variable types include ordinal and identifier variables, which require careful interpretation.