BackExploring Data with Graphs and Numerical Summaries
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Exploring Data with Graphs and Numerical Summaries
Section 2.1 – Different Types of Data
This section introduces the foundational concepts of variables and their classification, which is essential for understanding how to explore and summarize data in statistics.
Variables
Definition: A variable is any characteristic or attribute that can be observed and measured in a study.
Examples: Height, weight, annual income, GPA, rainfall, class rank, pizza topping, car make/model, eye color, shoe brand/style.
Types of Variables
Categorical Variables: Each observation belongs to one of a set of distinct categories.
Examples: Class rank (Freshman, Sophomore, etc.), pizza topping, car make/model, eye color, shoe brand/style.
Key Feature: The relative number of observations in each category (i.e., the distribution across categories).
Quantitative Variables: Each observation takes a numerical value representing different magnitudes of the variable.
Examples: Height, weight, annual income, GPA, rainfall.
Key Features: The center (e.g., mean, median) and variability (e.g., range, standard deviation) of the data.
Subtypes of Quantitative Variables
Discrete Quantitative Variables: Possible values form a set of separate numbers (often counts). Discrete variables have a finite number of possible values.
Examples: Number of siblings, number of students in a class, number of cars in a parking lot, daily number of people getting the flu, number of days with temperature above a threshold.
Continuous Quantitative Variables: Possible values form an interval, meaning the variable can take on any value within a range. Continuous variables have an infinite continuum of possible values.
Examples: Height, weight, age, temperature, distance, time, speed, rainfall amount.
Subtypes of Categorical Variables
Nominal Variables: Categories do not have an inherent order.
Examples: College major, blood type, sports jersey number, hair color.
Ordinal Variables: Categories have a meaningful order or ranking.
Examples: Grades (A, B, C, etc.), education level (High School, Bachelors, Doctorate), review ratings (1–5 stars), economic class.
Distribution of a Variable
The distribution of a variable describes how observations are spread across the range of possible values. Understanding the distribution is crucial for summarizing and interpreting data.
Frequency Table: A table listing possible values for a variable alongside the number of observations for each value.
Proportion: The proportion of observations in a category is calculated as:
This measure is useful for comparing the relative sizes of categories, especially in categorical data.
Summary Table: Types of Variables
Type | Subtype | Description | Examples |
|---|---|---|---|
Quantitative | Discrete | Finite set of separate values (counts) | Number of siblings, number of students in a class |
Quantitative | Continuous | Any value within an interval (measurements) | Height, weight, rainfall, temperature |
Categorical | Nominal | No inherent order among categories | Blood type, hair color, college major |
Categorical | Ordinal | Categories have a meaningful order | Grades, education level, review ratings |
Additional info: Understanding the type of variable is essential for choosing appropriate graphical and numerical summaries, as well as for selecting statistical methods for analysis.