Displaying Data with Graphs: Introduction to Statistical Data Types and Graphical Methods

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Displaying Distributions with Graphs

Introduction

Understanding how to display and interpret data graphically is a foundational skill in statistics. This section introduces the types of data, how to classify variables, and the main graphical methods used to summarize and visualize data distributions.

Individuals and Variables

Definitions

Individuals: The objects described in a data set. These can be people, animals, plants, or things (e.g., freshmen, newborns, golden retrievers, fields of corn, cells).
Variable: Any property or characteristic that can take different values for different individuals (e.g., age, gender, blood pressure, blood type, leaf length, flower color).

Types of Variables

Quantitative Variables

Represent a measurable quantity for each individual.
Examples: Age (in years), blood pressure (in mm Hg), leaf length (in cm).
It is meaningful to calculate averages and other numerical summaries.

Categorical Variables

Describe a characteristic or quality of each individual.
Examples: Gender (male, female), blood type (A, B, AB, O), flower color (white, yellow, red).
Summarized by counts or proportions of individuals in each category.

Classifying Variables

Steps to Classify Variables

Identify the n individuals in the sample or population.
Determine what is being recorded about those individuals.
Decide if the recorded value is a number (quantitative) or a statement/category (categorical).

Example Table: Diagnoses and Age at Death

Individuals studied	Diagnosis	Age at death
Patient A	Heart disease	56
Patient B	Stroke	70
Patient C	Stroke	75
Patient D	Lung cancer	60
Patient E	Heart disease	80
Patient F	Accident	73
Patient G	Diabetes	69

Diagnosis: Categorical variable (description).
Age at death: Quantitative variable (meaningful number).

Examples: Classifying Variables in Research

Experimental Example: Mice and Metastases

Researchers grafted human cancerous cells onto 20 healthy adult mice.
10 mice were injected with tumor-specific antibodies (anti-CD47), 10 with a control (IgG).
Variables recorded: Treatment group (categorical), presence of metastases (categorical), number of metastases (quantitative).

Sample Data Table

Mouse	Treatment	Presence of metastases	Number of metastases
1	IgG	yes	1
2	IgG	yes	1
3	IgG	yes	2
4	IgG	yes	2
5	IgG	yes	2
6	IgG	yes	3
7	IgG	yes	3
8	IgG	yes	3
9	IgG	yes	3
10	IgG	yes	4
11	anti-CD47	no	0
12	anti-CD47	no	0
13	anti-CD47	no	0

Individuals: Each mouse.
Variables: Treatment (categorical), presence of metastases (categorical), number of metastases (quantitative).

Graphing Categorical Data

Bar Graphs

Each category is represented by a bar.
The height of the bar shows the count, frequency, or percent of individuals in that category.

Pie Charts

Show how a single categorical variable breaks down into its components.
Each slice represents the proportion or percent of the whole for each category.

Example

Bar graph: Number of mice exhibiting metastases in each group (IgG vs. anti-CD47).
Pie chart: Proportion of individuals with different blood types in a population.

Graphing Quantitative Data

Histograms

Summarize the distribution of a single quantitative variable.
The range of values is divided into equal-size intervals (bins or classes).
The height of each bar shows the frequency (count) or relative frequency (percent) of data points in each interval.

Dotplots

Display each data point as a dot along a single axis.
Useful for small data sets to show the exact values and their distribution.

Time Plots

Used for data collected over time (time series).
The horizontal axis represents time; the vertical axis shows the variable of interest.
Trends and cyclical patterns can be observed.

Making and Interpreting Histograms

Steps to Create a Histogram

Divide the range of the quantitative variable into equal-size intervals (classes).
Count the number of data points in each interval.
Draw a bar for each interval; the height represents the count or percent.

Choosing the Number of Classes

Too few classes: Overly summarized, may hide important features.
Too many classes: Too detailed, may be hard to interpret.
Start with 5 to 10 classes and adjust as needed.

Interpreting Histograms

Look for the overall pattern (shape, center, spread) and for outliers.
Shape: Unimodal, bimodal, symmetric, skewed left, skewed right.
Center: Approximate midpoint of the data.
Spread: Range of values taken by the variable.
Outliers: Observations that fall outside the overall pattern.

Common Distribution Shapes

Symmetric: Left and right halves are mirror images.
Skewed left: Left side (lower values) extends farther out.
Skewed right: Right side (higher values) extends farther out.

Summary Table: Types of Variables and Graphs

Variable Type	Examples	Appropriate Graphs
Quantitative	Age, blood pressure, leaf length	Histogram, dotplot, time plot
Categorical	Gender, blood type, flower color	Bar graph, pie chart

Key Formulas

Relative Frequency:
Percent:

Example: Interpreting a Bar Graph

Suppose a bar graph shows the number of individuals in four age groups who currently use marijuana.
Individuals: People in the sample.
Variable: Age group (categorical), marijuana use (categorical: yes/no).
This data could also be represented in a pie chart if only one categorical variable is summarized.

Example: Interpreting a Time Plot

Monthly atmospheric CO2 levels recorded over several decades.
Time plot shows trends (e.g., increasing CO2 over time) and possible seasonal cycles.

Conclusion

Proper classification of variables and selection of appropriate graphical methods are essential for effective data analysis in statistics. Understanding the types of variables and how to visualize them helps reveal patterns, trends, and outliers in data, forming the basis for further statistical inference.