Frequency Distributions and Histograms: Study Notes

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Frequency Distributions & Histograms

Introduction

Frequency distributions and histograms are foundational tools in statistics for organizing, summarizing, and visualizing data. They help reveal patterns, trends, and the overall distribution of data sets, making complex data easier to interpret and analyze.

Frequency Distributions

Key Concepts

Frequency Distribution: A table that displays the number of occurrences (frequency) of each value or group of values in a data set.
Purpose: To organize large data sets, making it easier to understand the distribution and identify patterns.

Tables for Categorical Variables

When dealing with categorical data, frequency tables help summarize the counts for each category.

The table must include at least two columns: the variable(s) of interest and the counts (frequencies) for each value.
Variable: In statistics, a variable is any characteristic, number, or quantity that can be measured or counted. The value of a variable can "vary" from one entity to another.

Example: A survey asks 55 students to choose their favorite type of fries from options such as French Fries, Curly Fries, Potato Wedges, etc. The resulting data can be organized into a frequency table showing the count for each type.

Sample Frequency Table for Categorical Data

Type of Fries	Frequency
French Fries	12
Curly Fries	10
Potato Wedges	8
Seasoned Fries	15
Chili Cheese Fries	10

Relative Frequency Distributions

Relative Frequency: The proportion of the total number of data values that fall within a particular category or class.
Formula:

Sample Relative Frequency Table

Type of Fries	Relative Frequency
French Fries	0.22
Curly Fries	0.18
Potato Wedges	0.15
Seasoned Fries	0.27
Chili Cheese Fries	0.18

Additional info: Relative frequencies are often expressed as decimals or percentages.

Tables for Quantitative Variables

Quantitative data is organized into classes or intervals, and frequency tables summarize how many data values fall into each class.

Definitions

Lower Class Limit: The smallest value that can belong to a class.
Upper Class Limit: The largest value that can belong to a class.
Class Boundaries: The numbers used to separate classes, without gaps between them.
Class Midpoint: The value in the middle of a class, calculated as:

Class Width: The difference between two consecutive lower (or upper) class limits.

Sample Frequency Table for Quantitative Data

Class Interval	Frequency
20-29	12
30-39	8
40-49	6
50-59	4

Steps to Construct a Frequency Distribution (Quantitative Data)

Select the number of classes: Usually between 5 and 20. The ideal number can be approximated by Sturges' formula: where is the number of data values.
Calculate the class width: Round up to a convenient number.
Choose the lower class limit for the first class: Use the minimum value or a convenient value below it.
List lower class limits in a vertical column: Add the class width to each lower class limit to get the next one.
Determine upper class limits: Each upper class limit is one less than the next lower class limit.
Tally data values: Count how many data values fall into each class.

Example: Constructing a Frequency Distribution

Suppose a sample of 50 people in Los Angeles is asked about their daily commute time (in minutes). The data is grouped into classes of width 15 minutes, starting at 0.

Daily Commute Time (minutes)	Frequency
0-14	6
15-29	24
30-44	8
45-59	6
60-74	4
75-89	2

Cumulative Frequency Distribution

A cumulative frequency distribution shows the sum of frequencies for all classes up to and including the current class.

Daily Commute Times (minutes)	Cumulative Frequency
Less than 15	6
Less than 30	30
Less than 45	38
Less than 60	44
Less than 75	48
Less than 90	50

Critical Thinking: Using Frequency Distributions to Understand Data

Frequency distributions can help determine if data is approximately normally distributed. A normal distribution has the following characteristics:

The frequencies increase to a maximum (the mode) and then decrease symmetrically.
The distribution is approximately symmetric about the center.

Example: If a frequency distribution is bell-shaped and symmetric, it may suggest a normal distribution.

Histograms

Key Concept

A histogram is a graphical representation of a frequency distribution. It consists of adjacent bars of equal width, where the height of each bar represents the frequency (or relative frequency) of data within each class.

Definitions

Histogram: A bar graph with bars of equal width drawn adjacent to each other (unless there are gaps in the data).
The horizontal axis represents classes of data values.
The vertical axis represents frequencies or relative frequencies.
The height of each bar corresponds to the frequency (or relative frequency) for that class.

Example: Constructing a Histogram

Given a frequency distribution, plot a histogram by drawing bars for each class interval, with heights corresponding to the frequencies.

Relative Frequency Histogram

Similar to a histogram, but the vertical axis represents relative frequencies instead of raw counts. This allows for comparison between data sets of different sizes.

Summary Table: Types of Frequency Distributions

Type	Description	Example
Frequency Distribution	Shows counts for each class/category	Number of students preferring each type of fries
Relative Frequency Distribution	Shows proportion or percentage for each class	Percentage of students preferring each type of fries
Cumulative Frequency Distribution	Shows running total of frequencies up to each class	Number of students with commute times less than a given value

Bar Graphs vs. Histograms

Bar Graphs: Used for categorical data; bars are separated by spaces.
Histograms: Used for quantitative data; bars are adjacent with no gaps (unless there are gaps in the data).

Additional info: Bar graphs emphasize differences between categories, while histograms emphasize the distribution of numerical data.