BackOrganizing Data: Types of Variables and Frequency Distributions
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Chapter 2: Organizing Data
Introduction
This chapter introduces fundamental concepts in statistics related to organizing and classifying data. It covers the definition and types of variables, distinctions between qualitative and quantitative data, and methods for summarizing categorical data using frequency tables and charts.
Variables and Data
Definition of Variables
Variable: A characteristic or property that can take on different values for different individuals or objects. Examples include height, weight, number of siblings, sex, marital status, and zipcode.
Data: The observed values of variables collected from individuals or objects.
Example Table: Types of Variables
height | weight | siblings | sex | marriage | zipcode |
|---|---|---|---|---|---|
5 | 110 | 0 | female | married | 97219 |
6.3 | 180 | 4 | male | married | 97405 |
5.7 | 145 | 2 | male | single | 97219 |
Types of Variables
Qualitative vs. Quantitative Variables
Qualitative (Categorical) Variables: Variables that describe qualities or categories and are non-numerically valued. Examples: sex, marital status, zipcode.
Quantitative Variables: Variables that are numerically valued and represent counts or measurements. Examples: height, weight, number of siblings.
Subtypes of Quantitative Variables
Discrete Variables: Quantitative variables that take on a countable number of distinct values, often representing counts. Example: number of siblings.
Continuous Variables: Quantitative variables that can take on any value within a given range, typically representing measurements. Example: height, weight.
Classification Diagram
Variable
Qualitative
Quantitative
Discrete
Continuous
Examples: Identifying Variable Types
Number of people in your household: Quantitative, Discrete
Height of waterfalls: Quantitative, Continuous
Finishing time of marathon runners: Quantitative, Continuous
Order of finish in a running competition: Qualitative (ordinal)
Global Industry Classification Standard (GICS) code: Qualitative (categorical)
Organizing Qualitative Data
Frequency Distribution Table
A frequency distribution table summarizes categorical data by listing each category and the number of observations in each.
Example: Stocks Classified by GICS Sector
Stocks | GICS_Sector |
|---|---|
Apple | Information Technology |
Microsoft | Information Technology |
Johnson & Johnson | Healthcare |
Pfizer | Healthcare |
Exxon Mobil | Energy |
Chevron | Energy |
Walmart | Consumer Staples |
Procter & Gamble | Consumer Staples |
Coca-Cola | Consumer Staples |
Amazon | Consumer Discretionary |
Tesla | Consumer Discretionary |
Home Depot | Consumer Discretionary |
JP Morgan Chase | Financials |
Goldman Sachs | Financials |
Visa | Financials |
Berkshire Hathaway | Financials |
AT&T | Communication Services |
Verizon | Communication Services |
Duke Energy | Utilities |
American Electric Power | Utilities |
Grouping Data by Category
GICS_Sector | Stocks |
|---|---|
Communication Services | AT&T, Verizon |
Consumer Discretionary | Amazon, Tesla, Home Depot |
Consumer Staples | Walmart, Procter & Gamble, Coca-Cola |
Energy | Exxon Mobil, Chevron |
Financials | JP Morgan Chase, Goldman Sachs, Visa, Berkshire Hathaway |
Healthcare | Johnson & Johnson, Pfizer |
Information Technology | Apple, Microsoft |
Utilities | Duke Energy, American Electric Power |
Frequency Table: Counting Observations per Category
GICS_Sector | Frequency |
|---|---|
Communication Services | 2 |
Consumer Discretionary | 3 |
Consumer Staples | 3 |
Energy | 2 |
Financials | 4 |
Healthcare | 2 |
Information Technology | 2 |
Utilities | 2 |
Relative Frequency Table
Relative frequency expresses the proportion of observations in each category, calculated as:
GICS_Sector | Relative_Frequency |
|---|---|
Communication Services | 0.10 |
Consumer Discretionary | 0.15 |
Consumer Staples | 0.15 |
Energy | 0.10 |
Financials | 0.20 |
Healthcare | 0.10 |
Information Technology | 0.10 |
Utilities | 0.10 |
Pie Chart Representation
A pie chart visually represents categorical data, dividing a circle into wedge-shaped pieces proportional to the relative frequencies of each category.
Each sector's wedge size reflects its proportion in the dataset.
Summary Table: Types of Variables
Type | Description | Examples |
|---|---|---|
Qualitative | Non-numerical, describes categories or qualities | Sex, Marital Status, Zipcode, GICS Sector |
Quantitative - Discrete | Numerical, countable values | Number of siblings, Number of people in household |
Quantitative - Continuous | Numerical, measurable values within a range | Height, Weight, Marathon finishing time |
Key Points
Variables are classified as qualitative or quantitative, with quantitative variables further divided into discrete and continuous types.
Frequency and relative frequency tables are essential tools for summarizing categorical data.
Pie charts provide a visual summary of the distribution of categorical data.
Example Application
In portfolio analysis, stocks can be grouped by sector, and the frequency and relative frequency of each sector can be calculated to understand the composition of the portfolio.
Additional info: Relative frequency values in the table are inferred based on a total of 20 stocks. The summary table of variable types is added for academic completeness.