Skip to main content
Back

Organizing and Summarizing Data: Graphical and Tabular Methods in Statistics

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Organizing and Summarizing Data

Introduction

Organizing and summarizing data are foundational skills in statistics, enabling researchers to extract meaningful information from raw data. This chapter covers methods for organizing qualitative and quantitative data, constructing various graphical displays, and recognizing potential misrepresentations in statistical graphics.

Organizing Qualitative Data

Frequency and Relative Frequency Distributions

  • Frequency Distribution: A table that lists each category of qualitative data and the number of occurrences for each category.

  • Relative Frequency: The proportion (or percent) of observations within a category, calculated as:

  • Relative Frequency Distribution: A table listing each category with its relative frequency.

Bar Graphs and Pareto Charts

  • Bar Graph: Displays categories on one axis and frequencies or relative frequencies on the other. Bars are of equal width and do not touch.

  • Pareto Chart: A bar graph with bars ordered from highest to lowest frequency or relative frequency.

  • Side-by-Side Bar Graphs: Used to compare two data sets, typically using relative frequencies for fair comparison.

Example: The side-by-side bar graph below compares educational attainment in 1990 and 2021, illustrating changes in the proportion of adults with various education levels.

Side-by-side bar graph of educational attainment in 1990 vs 2021

Horizontal bar graphs are preferable when category names are lengthy.

Horizontal side-by-side bar graph of educational attainment in 1990 vs 2021

Pie Charts

  • Pie Chart: A circular graph divided into sectors, each representing a category. The area of each sector is proportional to the category's frequency.

  • To determine the angle for each sector:

Example: The pie chart below shows the distribution of educational attainment among U.S. adults in 2021.

Pie chart of educational attainment in 2021

Organizing Quantitative Data

Discrete vs. Continuous Data

  • Discrete Data: Consists of countable values (e.g., number of arrivals).

  • Continuous Data: Can take any value within a range (e.g., heights, weights).

Frequency Distributions for Quantitative Data

  • For discrete data with few values, each value forms a class.

  • For continuous data or discrete data with many values, group data into intervals (classes).

  • Class Limits: The smallest and largest values in a class interval.

  • Class Width: The difference between consecutive lower class limits.

Histograms

  • Histogram: A graphical display of data using adjacent rectangles to show the frequency or relative frequency of classes. Rectangles touch each other, unlike bar graphs.

Dot Plots

  • Dot Plot: Each observation is represented by a dot above its value on a number line. Useful for small data sets.

Identifying the Shape of a Distribution

  • Uniform Distribution: Frequencies are evenly spread.

  • Bell-Shaped Distribution: Highest frequency in the middle, tails off symmetrically.

  • Skewed Right: Tail extends to the right.

  • Skewed Left: Tail extends to the left.

Additional Displays of Quantitative Data

Stem-and-Leaf Plots

  • Stem-and-Leaf Plot: Displays data by splitting each value into a "stem" (all but the rightmost digit) and a "leaf" (the rightmost digit).

  • Allows retrieval of raw data from the plot.

Example: The following stem-and-leaf plot shows the percentage of persons living in poverty by state in 2021.

Stem-and-leaf plot of poverty percentages (unordered leaves)

After arranging leaves in ascending order and adding a legend, the plot becomes:

Stem-and-leaf plot of poverty percentages (ordered leaves with legend)

Technology can also be used to generate stem-and-leaf plots:

Stem-and-leaf plot generated by software

Frequency Polygons

  • Frequency Polygon: A graph using points connected by line segments to represent class frequencies. Points are plotted at class midpoints.

  • Class Midpoint:

Cumulative Frequency and Relative Frequency Tables

  • Cumulative Frequency Distribution: Shows the total number of observations less than or equal to each class boundary.

  • Cumulative Relative Frequency Distribution: Shows the proportion (or percent) of observations less than or equal to each class boundary.

Ogives

  • Ogive: A graph of cumulative frequency or cumulative relative frequency versus the upper class limits, connected by line segments.

Time-Series Graphs

  • Time-Series Data: Values measured at different points in time.

  • Time-Series Plot: Plots time on the horizontal axis and the variable's value on the vertical axis, connecting points with line segments.

Example: The time-series plot below shows the Partisan Conflict Index in the U.S. federal government from 2004 to 2022.

Time-series plot of Partisan Conflict Index

Graphical Misrepresentations of Data

Common Ways Graphs Can Mislead

  • Improper Category Definitions: Combining or splitting categories inappropriately can mislead viewers about the distribution of data.

  • Manipulating Vertical Scale: Not starting the vertical axis at zero can exaggerate differences between groups.

  • Using Area or Volume Incorrectly: Representing data with areas or volumes (e.g., pictograms) can distort perceived differences.

  • Three-Dimensional Effects: 3D graphs can make some categories appear larger than they are due to perspective.

Example: The following bar graph shows the number of people in poverty over time. If the vertical axis does not start at zero, the decrease may appear more dramatic than it is.

Bar graph of number in poverty with truncated vertical axis

To avoid misleading, the graph should clearly indicate any truncation of the axis:

Bar graph of number in poverty with axis break symbol

Alternatively, a time-series plot of the percent in poverty focuses on trends rather than absolute numbers:

Time-series plot of percent in poverty

Guidelines for Constructing Good Graphics

  • Title and label axes clearly, including units and data sources.

  • Avoid distortion and minimize white space.

  • Indicate any truncation of scales.

  • Avoid clutter and unnecessary design elements.

  • Prefer two-dimensional graphs over three-dimensional ones.

  • Do not use relative graphs without data or scales.

Additional info: This summary covers the main methods for organizing and summarizing data, including frequency tables, bar graphs, pie charts, histograms, dot plots, stem-and-leaf plots, frequency polygons, ogives, and time-series plots, as well as guidelines for avoiding misleading graphics. These concepts are essential for effective data analysis and interpretation in statistics.

Pearson Logo

Study Prep