Skip to main content
Back

Chapter 2 - Organizing and Visualizing Variables in Business Statistics

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Chapter 2: Organizing and Visualizing Variables

Introduction

Organizing and visualizing variables is a foundational step in business statistics, enabling analysts to summarize, interpret, and communicate data effectively. This chapter covers methods for handling categorical and numerical variables, visualizing relationships, and avoiding common pitfalls in data presentation.

Organizing Categorical Variables

Summary Tables

  • Definition: A summary table displays the frequency or percentage of each category in a categorical variable.

  • Purpose: To compare how common each category is within a dataset.

  • Example: Device usage among millennials for watching movies/TV: 32% laptop/desktop, 10% smartphone, 9% tablet, 49% television.

Contingency Tables

  • Definition: A contingency table (cross-tabulation) displays the joint distribution of two or more categorical variables.

  • Structure: Rows and columns represent different variables; each cell shows the frequency or percentage for a unique combination.

  • Example Table:

Fund Type

Low Risk

Average Risk

High Risk

Total

Growth

20.59%

49.67%

29.74%

100%

Value

48.55%

41.62%

9.83%

100%

  • Interpretation: Growth funds are more likely to be high risk; value funds are more likely to be low risk.

Calculating Percentages

  • Overall Percentage:

  • Row Percentage:

  • Column Percentage:

Organizing Numerical Variables

Ordered Arrays

  • Definition: An ordered array is a list of numerical values arranged from smallest to largest.

  • Purpose: To quickly identify minimum, maximum, and the spread of data.

  • Example: Exam scores: 63, 64, 68, 71, 75, 88, 94.

Frequency Distributions

  • Definition: A frequency distribution groups data into intervals (classes) and counts the number of values in each interval.

  • Class Interval Width Formula:

  • Class Midpoint Formula:

Relative Frequency and Percentage Distributions

  • Relative Frequency:

  • Percentage:

  • Purpose: To compare groups of different sizes.

Cumulative Percentage Distribution

  • Definition: Shows the percentage of values less than a specific amount by successively adding class percentages.

  • Example Calculation: If 8% of meals cost $20–$30 and 6% cost $30–$40, then 14% cost less than $40.

Visualizing Categorical Variables

Bar Charts

  • Definition: Uses bars to represent the frequency or percentage of each category.

  • Best For: Comparing sizes of categories directly.

Pie and Doughnut Charts

  • Definition: Show how each category contributes to the whole as slices of a circle.

  • Slice Size Formula:

  • Best For: Emphasizing proportions of the total.

  • Tip: Avoid 3D and exploded charts to prevent misinterpretation.

Pareto Charts

  • Definition: Combines a bar chart (categories in descending order) with a cumulative percentage line.

  • Pareto Principle: Roughly 80% of effects come from 20% of causes.

  • Best For: Identifying the most significant categories ("vital few").

Side-by-Side Charts

  • Definition: Compare two categorical variables by grouping bars of one variable by the categories of another.

  • Best For: Highlighting differences and similarities between groups.

Visualizing Numerical Variables

Stem-and-Leaf Displays

  • Definition: Splits each value into a "stem" and a "leaf" to show distribution and individual data points.

  • Example: For 74, stem = 7, leaf = 4.

  • Tip: Rotating a stem-and-leaf display resembles a histogram.

Histograms

  • Definition: A bar chart for numerical data, with bars representing class intervals and no gaps between bars.

  • Best For: Showing the distribution and concentration of values.

Percentage Polygons

  • Definition: Plots class midpoints (X-axis) against class percentages (Y-axis), connecting points with lines.

  • Best For: Comparing distributions across groups.

Cumulative Percentage Polygons (Ogives)

  • Definition: Plots cumulative percentages against class boundaries to show the proportion of data below each value.

  • Interpretation: If one group's ogive is to the right of another's, it has higher values overall.

Visualizing Two Numerical Variables

Scatter Plots

  • Definition: Plots pairs of numerical variables (X, Y) to reveal relationships or correlations.

  • Example: NBA team revenue (X) vs. team value (Y) shows a strong positive relationship.

  • Regression Line: A straight line can be fitted to model the relationship:

Time-Series Plots

  • Definition: Plots a numerical variable over time to reveal trends, cycles, or patterns.

  • Example: Movie revenues from 1995 to 2016 show a consistent upward trend.

Organizing and Visualizing a Mix of Variables

Multidimensional Contingency Tables

  • Definition: Tables summarizing data for three or more variables (categorical or numeric).

  • Limitation: Only one summary statistic (e.g., mean) can be shown for each combination when including a numerical variable.

  • Example Table:

Fund Type

Risk Level

Mean 10YrReturn (%)

Growth

Low

8.06

Growth

Average

7.78

Growth

High

7.19

Value

Low

6.45

Value

Average

6.52

Value

High

5.97

Advanced Visualizations

  • Colored Scatter Plots: Show two numerical variables and one categorical variable (by color).

  • Bubble Charts: Add a third numerical variable by varying point size.

  • Pivots and Treemaps: Summarize and visualize hierarchical or multidimensional data.

  • Sparklines: Mini time-series plots for quick trend comparison across variables.

Filtering and Querying Data

  • Filtering: Selecting rows that meet specific criteria (e.g., funds with 5-star ratings).

  • Querying: Interactive filtering, possibly limiting columns as well as rows.

  • Tools: Excel filters, slicers, and software-specific features (JMP, Minitab).

  • Purpose: Focus analysis on relevant subsets for clearer insights.

Pitfalls in Organizing and Visualizing Variables

Obscuring Data

  • Too much detail or overly complex tables/charts can make interpretation difficult.

  • Overly complex legends or multidimensional tables may hide important patterns.

Creating False Impressions

  • Selective summarization (e.g., showing only one year of data) can mislead.

  • Improper chart design (e.g., misleading pie slices, axes not starting at zero) distorts interpretation.

Chartjunk

  • Unnecessary decorative elements obscure or distort data.

  • Best practice: Use clear, standard chart types and accurate labeling.

Software Guides (Excel, JMP, Minitab)

Excel

  • PivotTables for summary and contingency tables.

  • FREQUENCY function for distributions.

  • Insert charts for bar, pie, histogram, scatter, and time-series plots.

  • Slicers for interactive filtering.

JMP

  • Tabulate and Graph Builder for interactive summaries and visualizations.

  • Distribution function for histograms and stem-and-leaf displays.

  • Drag-and-drop interface for flexible analysis.

Minitab

  • Tally Individual Variables for summary tables.

  • Cross Tabulation for contingency tables.

  • Histogram and Bar Chart tools for visualizations.

  • Subset Worksheet for filtering data.

Conclusion

Effective organization and visualization of variables are essential for accurate data analysis and interpretation in business statistics. By choosing appropriate methods and avoiding common pitfalls, analysts can ensure their findings are clear, reliable, and actionable.

Additional info: This summary integrates textbook-style explanations, formulas, and practical examples, and includes guidance for using common statistical software tools.

Pearson Logo

Study Prep