Skip to main content
Back

Organizing and Visualizing Variables: Structured Study Notes for Business Statistics

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Organizing and Visualizing Variables

Introduction

This chapter focuses on methods for organizing and visualizing both categorical and numerical variables, which are foundational skills in business statistics. Effective data representation is crucial for accurate analysis, interpretation, and decision-making.

Why Data Representation Is Important

Significance of Data Representation

  • Condenses Raw Data: Summarizes large datasets into manageable forms for analysis.

  • Facilitates Interpretation: Enables quick visual understanding of data patterns and distributions.

  • Highlights Key Characteristics: Reveals where data are concentrated, clustered, or outlying.

Organizing Categorical Data

Summary Tables

A summary table displays the frequencies or percentages of items in each category, allowing for easy comparison between categories.

  • Definition: Table listing counts or percentages for each category.

  • Example: Devices Millennials Use to Watch Movies or TV Shows.

Devices Used To Watch Movies or TV Shows

Percent

Television Set

49%

Tablet

9%

Smartphone

10%

Laptop / Desktop

32%

Contingency Tables

Contingency tables organize data for two or more categorical variables, showing joint frequencies and revealing patterns or relationships.

  • Rows: Categories of one variable

  • Columns: Categories of another variable

No Errors

Errors

Total

Small Amount

170

20

190

Medium Amount

100

40

140

Large Amount

65

5

70

Total

335

65

400

Contingency Table Percentages

  • Overall Total: Percentage of each cell relative to the grand total.

  • Row Total: Percentage of each cell relative to its row total.

  • Column Total: Percentage of each cell relative to its column total.

Example Calculation:

Organizing Numerical Data

Ordered Array

An ordered array is a sequence of data arranged from smallest to largest, useful for identifying the range and outliers.

Day Students

Night Students

16, 17, 18, 18, 19, 19, 20, 20, 21, 21, 22, 22, 25, 27, 32, 38, 42

18, 18, 19, 19, 20, 21, 23, 28, 32, 33, 41, 42

Frequency Distribution

A frequency distribution is a summary table where data are grouped into numerically ordered classes.

  • Class Interval:

  • Class Boundaries: Define the limits for each class grouping.

  • Class Midpoints: Average of lower and upper class boundaries.

Class

Midpoints

Frequency

10 but less than 20

15

3

20 but less than 30

25

6

30 but less than 40

35

5

40 but less than 50

45

4

50 but less than 60

55

2

Total

20

Relative and Percent Frequency Distribution

  • Relative Frequency:

  • Percentage:

Class

Frequency

Relative Frequency

Percentage

10 but less than 20

3

0.15

15%

20 but less than 30

6

0.30

30%

30 but less than 40

5

0.25

25%

40 but less than 50

4

0.20

20%

50 but less than 60

2

0.10

10%

Total

20

1.00

100%

Cumulative Frequency Distribution

  • Cumulative Frequency: Sum of frequencies up to and including the current class.

  • Cumulative Percentage:

Class

Frequency

Percentage

Cumulative Frequency

Cumulative Percentage

10 but less than 20

3

15%

3

15%

20 but less than 30

6

30%

9

45%

30 but less than 40

5

25%

14

70%

40 but less than 50

4

20%

18

90%

50 but less than 60

2

10%

20

100%

Visualizing Categorical Data

Bar Chart

  • Definition: Visualizes a categorical variable as a series of bars, with length representing frequency or percentage.

  • Application: Useful for comparing categories side by side.

Pie Chart

  • Definition: Circle divided into slices representing categories; size of slice proportional to percentage.

  • Application: Effective for showing parts of a whole.

Doughnut Chart

  • Definition: Similar to pie chart but with a blank center; slices represent categories.

  • Application: Used for visualizing proportions in a more modern format.

Pareto Chart

  • Definition: Vertical bar chart with categories in descending order of frequency; includes cumulative line.

  • Purpose: Separates the "vital few" from the "trivial many" (Pareto principle: 80/20 rule).

Cause

Frequency

Percent

Cumulative Percent

Warped card jammed

365

50.41%

50.41%

Card unreadable

234

32.32%

82.73%

ATM malfunctions

32

4.42%

87.15%

ATM out of cash

28

3.87%

91.02%

Invalid amount requested

23

3.18%

94.20%

Wrong keystroke

23

3.18%

97.38%

Lack of funds in account

19

2.62%

100.00%

Visualizing Numerical Data

Histogram

  • Definition: Vertical bar chart of frequency distribution; no gaps between bars.

  • Axes: Horizontal axis shows class boundaries or midpoints; vertical axis shows frequency, relative frequency, or percentage.

Stem-and-Leaf Display

  • Definition: Data are split into "stems" (leading digits) and "leaves" (trailing digits) to show distribution and concentration.

  • Application: Useful for small datasets and identifying outliers.

Visualizing Two Numerical Variables

Scatter Plot

  • Definition: Plots paired observations from two numerical variables; X-axis for one variable, Y-axis for the other.

  • Purpose: Examines relationships or correlations between variables.

Time Series Plot

  • Definition: Plots values of a numeric variable over time; time on X-axis, variable on Y-axis.

  • Purpose: Identifies trends, cycles, or patterns over time.

Summarizing a Mix of Variables

Multidimensional Contingency Tables

  • Definition: Tallies responses for three or more categorical variables.

  • Application: Reveals complex patterns and relationships in multidimensional data.

  • Best Practice: Limit to three or four variables for clarity.

Avoiding Common Errors in Data Visualization

Common Graphical Errors

  • Selective Summarization: Presenting only part of the data can mislead.

  • Improperly Constructed Charts: Issues include missing labels, axes not starting at zero, or broken axes.

  • Chartjunk: Unnecessary decorations or effects that distract from the data.

Best Practices for Constructing Visualizations

  • Use the simplest possible visualization.

  • Include a title and label all axes.

  • Begin vertical axes at zero and use a constant scale.

  • Avoid 3D effects and chartjunk.

  • Use consistent coloring for comparable charts.

  • Avoid uncommon chart types unless necessary.

Chapter Summary

  • Organizing and visualizing categorical variables using tables and charts.

  • Organizing and visualizing numerical variables using arrays and distributions.

  • Summarizing a mix of variables with multidimensional tables.

  • Avoiding common errors in data visualization for accurate interpretation.

Pearson Logo

Study Prep