BackOrganizing and Visualizing Variables: Structured Study Notes for Business Statistics
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Organizing and Visualizing Variables
Introduction
This chapter focuses on methods for organizing and visualizing both categorical and numerical variables, which are foundational skills in business statistics. Effective data representation is crucial for accurate analysis, interpretation, and decision-making.
Why Data Representation Is Important
Significance of Data Representation
Condenses Raw Data: Summarizes large datasets into manageable forms for analysis.
Facilitates Interpretation: Enables quick visual understanding of data patterns and distributions.
Highlights Key Characteristics: Reveals where data are concentrated, clustered, or outlying.
Organizing Categorical Data
Summary Tables
A summary table displays the frequencies or percentages of items in each category, allowing for easy comparison between categories.
Definition: Table listing counts or percentages for each category.
Example: Devices Millennials Use to Watch Movies or TV Shows.
Devices Used To Watch Movies or TV Shows | Percent |
|---|---|
Television Set | 49% |
Tablet | 9% |
Smartphone | 10% |
Laptop / Desktop | 32% |
Contingency Tables
Contingency tables organize data for two or more categorical variables, showing joint frequencies and revealing patterns or relationships.
Rows: Categories of one variable
Columns: Categories of another variable
No Errors | Errors | Total | |
|---|---|---|---|
Small Amount | 170 | 20 | 190 |
Medium Amount | 100 | 40 | 140 |
Large Amount | 65 | 5 | 70 |
Total | 335 | 65 | 400 |
Contingency Table Percentages
Overall Total: Percentage of each cell relative to the grand total.
Row Total: Percentage of each cell relative to its row total.
Column Total: Percentage of each cell relative to its column total.
Example Calculation:
Organizing Numerical Data
Ordered Array
An ordered array is a sequence of data arranged from smallest to largest, useful for identifying the range and outliers.
Day Students | Night Students |
|---|---|
16, 17, 18, 18, 19, 19, 20, 20, 21, 21, 22, 22, 25, 27, 32, 38, 42 | 18, 18, 19, 19, 20, 21, 23, 28, 32, 33, 41, 42 |
Frequency Distribution
A frequency distribution is a summary table where data are grouped into numerically ordered classes.
Class Interval:
Class Boundaries: Define the limits for each class grouping.
Class Midpoints: Average of lower and upper class boundaries.
Class | Midpoints | Frequency |
|---|---|---|
10 but less than 20 | 15 | 3 |
20 but less than 30 | 25 | 6 |
30 but less than 40 | 35 | 5 |
40 but less than 50 | 45 | 4 |
50 but less than 60 | 55 | 2 |
Total | 20 |
Relative and Percent Frequency Distribution
Relative Frequency:
Percentage:
Class | Frequency | Relative Frequency | Percentage |
|---|---|---|---|
10 but less than 20 | 3 | 0.15 | 15% |
20 but less than 30 | 6 | 0.30 | 30% |
30 but less than 40 | 5 | 0.25 | 25% |
40 but less than 50 | 4 | 0.20 | 20% |
50 but less than 60 | 2 | 0.10 | 10% |
Total | 20 | 1.00 | 100% |
Cumulative Frequency Distribution
Cumulative Frequency: Sum of frequencies up to and including the current class.
Cumulative Percentage:
Class | Frequency | Percentage | Cumulative Frequency | Cumulative Percentage |
|---|---|---|---|---|
10 but less than 20 | 3 | 15% | 3 | 15% |
20 but less than 30 | 6 | 30% | 9 | 45% |
30 but less than 40 | 5 | 25% | 14 | 70% |
40 but less than 50 | 4 | 20% | 18 | 90% |
50 but less than 60 | 2 | 10% | 20 | 100% |
Visualizing Categorical Data
Bar Chart
Definition: Visualizes a categorical variable as a series of bars, with length representing frequency or percentage.
Application: Useful for comparing categories side by side.
Pie Chart
Definition: Circle divided into slices representing categories; size of slice proportional to percentage.
Application: Effective for showing parts of a whole.
Doughnut Chart
Definition: Similar to pie chart but with a blank center; slices represent categories.
Application: Used for visualizing proportions in a more modern format.
Pareto Chart
Definition: Vertical bar chart with categories in descending order of frequency; includes cumulative line.
Purpose: Separates the "vital few" from the "trivial many" (Pareto principle: 80/20 rule).
Cause | Frequency | Percent | Cumulative Percent |
|---|---|---|---|
Warped card jammed | 365 | 50.41% | 50.41% |
Card unreadable | 234 | 32.32% | 82.73% |
ATM malfunctions | 32 | 4.42% | 87.15% |
ATM out of cash | 28 | 3.87% | 91.02% |
Invalid amount requested | 23 | 3.18% | 94.20% |
Wrong keystroke | 23 | 3.18% | 97.38% |
Lack of funds in account | 19 | 2.62% | 100.00% |
Visualizing Numerical Data
Histogram
Definition: Vertical bar chart of frequency distribution; no gaps between bars.
Axes: Horizontal axis shows class boundaries or midpoints; vertical axis shows frequency, relative frequency, or percentage.
Stem-and-Leaf Display
Definition: Data are split into "stems" (leading digits) and "leaves" (trailing digits) to show distribution and concentration.
Application: Useful for small datasets and identifying outliers.
Visualizing Two Numerical Variables
Scatter Plot
Definition: Plots paired observations from two numerical variables; X-axis for one variable, Y-axis for the other.
Purpose: Examines relationships or correlations between variables.
Time Series Plot
Definition: Plots values of a numeric variable over time; time on X-axis, variable on Y-axis.
Purpose: Identifies trends, cycles, or patterns over time.
Summarizing a Mix of Variables
Multidimensional Contingency Tables
Definition: Tallies responses for three or more categorical variables.
Application: Reveals complex patterns and relationships in multidimensional data.
Best Practice: Limit to three or four variables for clarity.
Avoiding Common Errors in Data Visualization
Common Graphical Errors
Selective Summarization: Presenting only part of the data can mislead.
Improperly Constructed Charts: Issues include missing labels, axes not starting at zero, or broken axes.
Chartjunk: Unnecessary decorations or effects that distract from the data.
Best Practices for Constructing Visualizations
Use the simplest possible visualization.
Include a title and label all axes.
Begin vertical axes at zero and use a constant scale.
Avoid 3D effects and chartjunk.
Use consistent coloring for comparable charts.
Avoid uncommon chart types unless necessary.
Chapter Summary
Organizing and visualizing categorical variables using tables and charts.
Organizing and visualizing numerical variables using arrays and distributions.
Summarizing a mix of variables with multidimensional tables.
Avoiding common errors in data visualization for accurate interpretation.