BackFrequency Distributions and Histograms: Study Notes
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Frequency Distributions & Histograms
Introduction
Frequency distributions and histograms are foundational tools in statistics for organizing, summarizing, and visualizing data. They help reveal patterns, trends, and the overall distribution of data sets, making complex data easier to interpret and analyze.
Frequency Distributions
Key Concepts
Frequency Distribution: A table that displays the number of occurrences (frequency) of each value or group of values in a data set.
Purpose: To organize large data sets, making it easier to understand the distribution and identify patterns.
Tables for Categorical Variables
When dealing with categorical data, frequency tables help summarize the counts for each category.
The table must include at least two columns: the variable(s) of interest and the counts (frequencies) for each value.
Variable: In statistics, a variable is any characteristic, number, or quantity that can be measured or counted. The value of a variable can "vary" from one entity to another.
Example: A survey asks 55 students to choose their favorite type of fries from options such as French Fries, Curly Fries, Potato Wedges, etc. The resulting data can be organized into a frequency table showing the count for each type.
Sample Frequency Table for Categorical Data
Type of Fries | Frequency |
|---|---|
French Fries | 12 |
Curly Fries | 10 |
Potato Wedges | 8 |
Seasoned Fries | 15 |
Chili Cheese Fries | 10 |
Relative Frequency Distributions
Relative Frequency: The proportion of the total number of data values that fall within a particular category or class.
Formula:
Sample Relative Frequency Table
Type of Fries | Relative Frequency |
|---|---|
French Fries | 0.22 |
Curly Fries | 0.18 |
Potato Wedges | 0.15 |
Seasoned Fries | 0.27 |
Chili Cheese Fries | 0.18 |
Additional info: Relative frequencies are often expressed as decimals or percentages.
Tables for Quantitative Variables
Quantitative data is organized into classes or intervals, and frequency tables summarize how many data values fall into each class.
Definitions
Lower Class Limit: The smallest value that can belong to a class.
Upper Class Limit: The largest value that can belong to a class.
Class Boundaries: The numbers used to separate classes, without gaps between them.
Class Midpoint: The value in the middle of a class, calculated as:
Class Width: The difference between two consecutive lower (or upper) class limits.
Sample Frequency Table for Quantitative Data
Class Interval | Frequency |
|---|---|
20-29 | 12 |
30-39 | 8 |
40-49 | 6 |
50-59 | 4 |
Steps to Construct a Frequency Distribution (Quantitative Data)
Select the number of classes: Usually between 5 and 20. The ideal number can be approximated by Sturges' formula: where is the number of data values.
Calculate the class width: Round up to a convenient number.
Choose the lower class limit for the first class: Use the minimum value or a convenient value below it.
List lower class limits in a vertical column: Add the class width to each lower class limit to get the next one.
Determine upper class limits: Each upper class limit is one less than the next lower class limit.
Tally data values: Count how many data values fall into each class.
Example: Constructing a Frequency Distribution
Suppose a sample of 50 people in Los Angeles is asked about their daily commute time (in minutes). The data is grouped into classes of width 15 minutes, starting at 0.
Daily Commute Time (minutes) | Frequency |
|---|---|
0-14 | 6 |
15-29 | 24 |
30-44 | 8 |
45-59 | 6 |
60-74 | 4 |
75-89 | 2 |
Cumulative Frequency Distribution
A cumulative frequency distribution shows the sum of frequencies for all classes up to and including the current class.
Daily Commute Times (minutes) | Cumulative Frequency |
|---|---|
Less than 15 | 6 |
Less than 30 | 30 |
Less than 45 | 38 |
Less than 60 | 44 |
Less than 75 | 48 |
Less than 90 | 50 |
Critical Thinking: Using Frequency Distributions to Understand Data
Frequency distributions can help determine if data is approximately normally distributed. A normal distribution has the following characteristics:
The frequencies increase to a maximum (the mode) and then decrease symmetrically.
The distribution is approximately symmetric about the center.
Example: If a frequency distribution is bell-shaped and symmetric, it may suggest a normal distribution.
Histograms
Key Concept
A histogram is a graphical representation of a frequency distribution. It consists of adjacent bars of equal width, where the height of each bar represents the frequency (or relative frequency) of data within each class.
Definitions
Histogram: A bar graph with bars of equal width drawn adjacent to each other (unless there are gaps in the data).
The horizontal axis represents classes of data values.
The vertical axis represents frequencies or relative frequencies.
The height of each bar corresponds to the frequency (or relative frequency) for that class.
Example: Constructing a Histogram
Given a frequency distribution, plot a histogram by drawing bars for each class interval, with heights corresponding to the frequencies.
Relative Frequency Histogram
Similar to a histogram, but the vertical axis represents relative frequencies instead of raw counts. This allows for comparison between data sets of different sizes.
Summary Table: Types of Frequency Distributions
Type | Description | Example |
|---|---|---|
Frequency Distribution | Shows counts for each class/category | Number of students preferring each type of fries |
Relative Frequency Distribution | Shows proportion or percentage for each class | Percentage of students preferring each type of fries |
Cumulative Frequency Distribution | Shows running total of frequencies up to each class | Number of students with commute times less than a given value |
Bar Graphs vs. Histograms
Bar Graphs: Used for categorical data; bars are separated by spaces.
Histograms: Used for quantitative data; bars are adjacent with no gaps (unless there are gaps in the data).
Additional info: Bar graphs emphasize differences between categories, while histograms emphasize the distribution of numerical data.