Skip to main content
Back

Chapter 2

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Chapter 2: Organizing and Visualizing Variables

Objectives

This chapter introduces foundational techniques for organizing and visualizing both categorical and numerical variables in statistics. Students will learn to:

  • Organize and visualize categorical variables.

  • Organize and visualize numerical variables.

  • Summarize a mix of variable types.

  • Avoid common errors in organizing and visualizing data.

Organizing Data: Tabular and Visual Summaries

Purpose of Summaries

  • Tabular summaries guide further exploration and can facilitate decision making.

  • Visual summaries enable rapid review of large data sets and highlight significant patterns.

  • Often, the steps of organizing and visualizing data occur together in statistical analysis.

Organizing Categorical Data

Summary Tables

A summary table tallies the frequencies or percentages of items in each category, allowing for easy comparison between categories.

  • Used for a single categorical variable.

  • Displays counts or proportions for each category.

Example Table:

Devices Used To Watch Movies or TV Shows

Percentage

Television Set

49%

Tablet

9%

Smartphone

10%

Laptop / Desktop

32%

Contingency Tables

A contingency table is used to organize data for two or more categorical variables, showing the joint distribution of their responses.

  • Rows represent categories of one variable; columns represent categories of another.

  • Helps identify patterns and relationships between variables.

Example: A contingency table might show the frequency of invoices categorized by size (small, medium, large) and the presence or absence of errors.

Organizing Numerical Data

Ordered Array

An ordered array is a sequence of data arranged from smallest to largest value.

  • Shows the range (minimum to maximum).

  • Helps identify outliers.

Frequency Distribution

A frequency distribution summarizes data by grouping values into classes and counting the number of observations in each class.

  • Choose the number of classes (typically 5–15).

  • Class interval width:

  • Class boundaries should not overlap.

Example: For 20 winter days, daily high temperatures are grouped into intervals (e.g., 10–19, 20–29, etc.), and frequencies are counted.

Relative and Percent Frequency Distribution

  • Relative frequency:

  • Percent frequency:

Cumulative Frequency Distribution

  • Cumulative frequency: Sum of frequencies up to a given class.

  • Cumulative percent:

Visualizing Categorical Data

Bar Chart

A bar chart displays categories as bars, with bar length representing frequency or percentage. Bars are separated by gaps.

Pie Chart

A pie chart divides a circle into slices representing categories, with slice size proportional to category percentage.

Doughnut Chart

A doughnut chart is similar to a pie chart but with a blank center, representing categories as segments of the ring.

Pareto Chart

A Pareto chart is a vertical bar chart with categories ordered by descending frequency, often accompanied by a cumulative polygon. Used to identify the "vital few" categories.

Visualizing Numerical Data

Stem-and-Leaf Display

A stem-and-leaf display organizes data by splitting each value into a "stem" (leading digits) and "leaf" (trailing digits), showing distribution and concentration.

Histogram

A histogram is a vertical bar chart of a frequency distribution, with no gaps between bars. The horizontal axis shows class boundaries or midpoints; the vertical axis shows frequency, relative frequency, or percentage.

Frequency Polygon

A frequency polygon connects the midpoints of each class at their respective frequencies, useful for comparing groups.

Percentage Polygon and Ogive

  • Percentage polygon: Connects class midpoints at their percent frequencies.

  • Ogive (cumulative percentage polygon): Plots cumulative percentages against class boundaries, useful for comparing distributions.

Visualizing Two Numerical Variables

Scatter Plot

A scatter plot displays paired observations from two numerical variables, with one variable on the X-axis and the other on the Y-axis. Used to examine relationships.

Time Series Plot

A time series plot shows patterns in a numerical variable over time, with time on the X-axis and the variable on the Y-axis.

Organizing and Visualizing a Mix of Variables

Multidimensional Contingency Tables

These tables organize responses for three or more categorical variables, revealing complex patterns and relationships. Typically limited to three or four variables for clarity.

Advanced Visualizations

  • Colored scatter plots: Use color to represent categorical variables in addition to numerical axes.

  • Bubble charts: Use the size of points to represent a third variable.

  • Treemaps: Visualize hierarchical data using nested rectangles.

  • Sparklines: Compact time-series visualizations embedded in tables.

Filtering and Querying Data

Data Filtering and Querying

  • Filtering: Selects rows matching specific criteria.

  • Querying: Retrieves data based on conditions, possibly selecting specific columns.

  • Slicers in Excel: Interactive panels for filtering PivotTable data by variable values.

Common Pitfalls in Organizing and Visualizing Data

Potential Errors

  • Obscuring data or creating false impressions.

  • Selective summarization (showing only part of the data).

  • Improperly constructed charts (e.g., pie chart issues, axes not starting at zero, chartjunk).

  • Information overload.

Best Practices

  • Use the simplest possible visualization.

  • Include titles and label all axes.

  • Begin vertical axes at zero and use a constant scale.

  • Avoid 3D effects and chartjunk.

  • Use consistent colorings for comparison.

  • Avoid uncommon chart types unless necessary.

Chapter Summary

  • Organizing and visualizing categorical variables.

  • Organizing and visualizing numerical variables.

  • Summarizing a mix of variables.

  • Avoiding common errors in organizing and visualizing variables.

Pearson Logo

Study Prep