BackDisplaying Data with Graphs: Introduction to Statistical Data Types and Graphical Methods
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Displaying Distributions with Graphs
Introduction
Understanding how to display and interpret data graphically is a foundational skill in statistics. This section introduces the types of data, how to classify variables, and the main graphical methods used to summarize and visualize data distributions.
Individuals and Variables
Definitions
Individuals: The objects described in a data set. These can be people, animals, plants, or things (e.g., freshmen, newborns, golden retrievers, fields of corn, cells).
Variable: Any property or characteristic that can take different values for different individuals (e.g., age, gender, blood pressure, blood type, leaf length, flower color).
Types of Variables
Quantitative Variables
Represent a measurable quantity for each individual.
Examples: Age (in years), blood pressure (in mm Hg), leaf length (in cm).
It is meaningful to calculate averages and other numerical summaries.
Categorical Variables
Describe a characteristic or quality of each individual.
Examples: Gender (male, female), blood type (A, B, AB, O), flower color (white, yellow, red).
Summarized by counts or proportions of individuals in each category.
Classifying Variables
Steps to Classify Variables
Identify the n individuals in the sample or population.
Determine what is being recorded about those individuals.
Decide if the recorded value is a number (quantitative) or a statement/category (categorical).
Example Table: Diagnoses and Age at Death
Individuals studied | Diagnosis | Age at death |
|---|---|---|
Patient A | Heart disease | 56 |
Patient B | Stroke | 70 |
Patient C | Stroke | 75 |
Patient D | Lung cancer | 60 |
Patient E | Heart disease | 80 |
Patient F | Accident | 73 |
Patient G | Diabetes | 69 |
Diagnosis: Categorical variable (description).
Age at death: Quantitative variable (meaningful number).
Examples: Classifying Variables in Research
Experimental Example: Mice and Metastases
Researchers grafted human cancerous cells onto 20 healthy adult mice.
10 mice were injected with tumor-specific antibodies (anti-CD47), 10 with a control (IgG).
Variables recorded: Treatment group (categorical), presence of metastases (categorical), number of metastases (quantitative).
Sample Data Table
Mouse | Treatment | Presence of metastases | Number of metastases |
|---|---|---|---|
1 | IgG | yes | 1 |
2 | IgG | yes | 1 |
3 | IgG | yes | 2 |
4 | IgG | yes | 2 |
5 | IgG | yes | 2 |
6 | IgG | yes | 3 |
7 | IgG | yes | 3 |
8 | IgG | yes | 3 |
9 | IgG | yes | 3 |
10 | IgG | yes | 4 |
11 | anti-CD47 | no | 0 |
12 | anti-CD47 | no | 0 |
13 | anti-CD47 | no | 0 |
Individuals: Each mouse.
Variables: Treatment (categorical), presence of metastases (categorical), number of metastases (quantitative).
Graphing Categorical Data
Bar Graphs
Each category is represented by a bar.
The height of the bar shows the count, frequency, or percent of individuals in that category.
Pie Charts
Show how a single categorical variable breaks down into its components.
Each slice represents the proportion or percent of the whole for each category.
Example
Bar graph: Number of mice exhibiting metastases in each group (IgG vs. anti-CD47).
Pie chart: Proportion of individuals with different blood types in a population.
Graphing Quantitative Data
Histograms
Summarize the distribution of a single quantitative variable.
The range of values is divided into equal-size intervals (bins or classes).
The height of each bar shows the frequency (count) or relative frequency (percent) of data points in each interval.
Dotplots
Display each data point as a dot along a single axis.
Useful for small data sets to show the exact values and their distribution.
Time Plots
Used for data collected over time (time series).
The horizontal axis represents time; the vertical axis shows the variable of interest.
Trends and cyclical patterns can be observed.
Making and Interpreting Histograms
Steps to Create a Histogram
Divide the range of the quantitative variable into equal-size intervals (classes).
Count the number of data points in each interval.
Draw a bar for each interval; the height represents the count or percent.
Choosing the Number of Classes
Too few classes: Overly summarized, may hide important features.
Too many classes: Too detailed, may be hard to interpret.
Start with 5 to 10 classes and adjust as needed.
Interpreting Histograms
Look for the overall pattern (shape, center, spread) and for outliers.
Shape: Unimodal, bimodal, symmetric, skewed left, skewed right.
Center: Approximate midpoint of the data.
Spread: Range of values taken by the variable.
Outliers: Observations that fall outside the overall pattern.
Common Distribution Shapes
Symmetric: Left and right halves are mirror images.
Skewed left: Left side (lower values) extends farther out.
Skewed right: Right side (higher values) extends farther out.
Summary Table: Types of Variables and Graphs
Variable Type | Examples | Appropriate Graphs |
|---|---|---|
Quantitative | Age, blood pressure, leaf length | Histogram, dotplot, time plot |
Categorical | Gender, blood type, flower color | Bar graph, pie chart |
Key Formulas
Relative Frequency:
Percent:
Example: Interpreting a Bar Graph
Suppose a bar graph shows the number of individuals in four age groups who currently use marijuana.
Individuals: People in the sample.
Variable: Age group (categorical), marijuana use (categorical: yes/no).
This data could also be represented in a pie chart if only one categorical variable is summarized.
Example: Interpreting a Time Plot
Monthly atmospheric CO2 levels recorded over several decades.
Time plot shows trends (e.g., increasing CO2 over time) and possible seasonal cycles.
Conclusion
Proper classification of variables and selection of appropriate graphical methods are essential for effective data analysis in statistics. Understanding the types of variables and how to visualize them helps reveal patterns, trends, and outliers in data, forming the basis for further statistical inference.