BackDescribing and Comparing Distributions Using Stemplots
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Describing and Comparing Distributions
Introduction to Data Collection
In statistics, collecting and analyzing data is essential for understanding patterns and making informed decisions. One common classroom example is recording the number of pairs of shoes owned by students, which serves as a sample for larger populations.
Data Collection: Students record their responses (e.g., number of pairs of shoes) on a whiteboard or data sheet.
Variable Type: The variable "number of pairs of shoes" is quantitative because it represents numerical values, not categories.
Example: If students report 3, 5, 7, and 10 pairs, these are quantitative data points.
Stemplots (Stem-and-Leaf Plots)
A stemplot is a graphical method used to display quantitative data, particularly useful for small to moderate-sized data sets. It helps visualize the distribution, center, and spread of the data.
Stem: Represents all but the final digit of each data value.
Leaf: The last digit of each data value.
Key: Always include a key to explain how to read the stemplot (e.g., "3 | 2 = 32 pairs of shoes").
Include Empty Stems: Even if a stem has no leaves, include it to show gaps in the data.
Orientation: Stemplots are often rotated 90° counterclockwise to resemble a histogram (dotplot).
Example Stemplot:
Stem | Leaves |
|---|---|
0 | 5 7 |
1 | 0 2 2 3 4 5 5 5 6 7 |
2 | 0 2 3 5 5 6 7 |
3 | 2 |
Key: 3 | 2 = 32 pairs of shoes
Describing Distributions
When describing a distribution, consider the following characteristics:
Shape: Is the distribution symmetric, skewed left, or skewed right?
Center: What is the typical or middle value? (e.g., mean or median)
Variability (Spread): How spread out are the data? (e.g., range, interquartile range)
Outliers: Are there any unusually high or low values?
Skewed Right: Most data are on the lower end, with a tail extending to the right (higher values).
Skewed Left: Most data are on the higher end, with a tail extending to the left (lower values).
Symmetric: Data are evenly distributed around the center.
Example: In the shoe data, if most students have between 10 and 20 pairs, but a few have 30 or more, the distribution is skewed right.
Identifying Outliers
Outliers are data points that are significantly higher or lower than the rest of the data. They can be identified visually in a stemplot or by using mathematical rules (e.g., values more than 1.5 times the interquartile range above the third quartile or below the first quartile).
Example: If most students have fewer than 20 pairs of shoes, but one student has 50, 50 is a possible outlier.
Splitting Stems
To provide a clearer picture of the distribution, especially when data are clustered, stems can be split. For example, a stem of '2' can be split into '2L' (0-4) and '2H' (5-9).
Example: Splitting the stem '0' into '0L' (0-4) and '0H' (5-9) can show more detail in the distribution.
Effect: Splitting stems does not change the overall shape (e.g., still skewed right), but makes the distribution clearer.
Comparing Distributions: Back-to-Back Stemplots
Back-to-back stemplots are used to compare two related distributions, such as the percent of people wearing seat belts in states with different laws.
Purpose: To visually compare the shape, center, and spread of two groups.
Example: Comparing primary enforcement states vs. secondary enforcement states for seat belt usage.
Primary Enforcement | Stem | Secondary Enforcement |
|---|---|---|
55 56 57 | 55 | 54 55 56 |
60 62 63 | 60 | 59 60 61 |
70 72 74 | 70 | 68 70 71 |
Key: 65 | 2 = 65.2% seat belt usage
Summary Table: Describing Distributions
Characteristic | Description | Example |
|---|---|---|
Shape | Skewed left, skewed right, symmetric | Skewed right: shoe data with a few high values |
Center | Typical or middle value (mean, median) | Median number of shoes |
Variability | How spread out the data are (range, IQR) | Range from 5 to 50 pairs |
Outliers | Unusually high or low values | One student with 50 pairs |
Key Formulas
Mean:
Median: Middle value when data are ordered
Range:
Interquartile Range (IQR):
Applications
Stemplots: Useful for small data sets to quickly visualize distribution.
Back-to-Back Stemplots: Effective for comparing two groups, such as different states or classes.
Describing Distributions: Essential for summarizing data and making comparisons in research and real-world contexts.
Additional info: Mathematical rules for identifying outliers (such as the 1.5*IQR rule) will be covered in more detail in later lessons.