Descriptive Statistics: Frequency Distributions, Graphs, Measures of Central Tendency, and Variation

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

2.1 Frequency Distributions and Their Graphs

Frequency Distributions

A frequency distribution is a table that organizes data into classes or intervals, showing the number of entries (frequency) in each class. This method helps reveal patterns in large data sets by grouping data into manageable intervals.

Class: An interval grouping data entries.
Frequency (f): The count of data entries in a class.
Lower Class Limit: The smallest value that can belong to a class.
Upper Class Limit: The largest value that can belong to a class.
Class Width: The difference between lower (or upper) limits of consecutive classes.
Range: The difference between the maximum and minimum data entries.

Guidelines for Constructing a Frequency Distribution:

Decide on the number of classes (typically 5–20).
Calculate class width: (round up if necessary).
Determine class limits, ensuring no overlap.
Tally data entries into appropriate classes.
Count tallies to find frequencies.

Example Frequency Distribution Table:

Class	Frequency (f)
155–190	3
191–226	2
227–262	5
263–298	6
299–334	7
335–370	4
371–406	3
Total	30

Additional Features of Frequency Distributions

Midpoint:
Relative Frequency: , where is the sample size.
Cumulative Frequency: The sum of the frequencies for that class and all previous classes.

Expanded Frequency Distribution Table:

Class	Frequency (f)	Midpoint	Relative Frequency	Cumulative Frequency
155–190	3	172.5	0.10	3
191–226	2	208.5	0.07	5
227–262	5	244.5	0.17	10
263–298	6	280.5	0.20	16
299–334	7	316.5	0.23	23
335–370	4	352.5	0.13	27
371–406	3	388.5	0.10	30

Graphs of Frequency Distributions

Frequency Histogram: Uses bars to represent frequencies. Bars touch, and the horizontal axis uses class boundaries (not limits).
Class Boundaries: For integer data, subtract 0.5 from lower limits and add 0.5 to upper limits.
Frequency Polygon: A line graph connecting points at class midpoints and frequencies.
Relative Frequency Histogram: Like a histogram, but the vertical axis shows relative frequencies.
Ogive (Cumulative Frequency Graph): A line graph plotting cumulative frequency at each upper class boundary.

Interpretation: Graphs help visualize data distribution, identify patterns, and spot outliers or clusters.

2.2 More Graphs and Displays

Graphing Quantitative Data Sets

Stem-and-Leaf Plot: Splits each data value into a "stem" (all but the last digit) and a "leaf" (last digit). Retains original data and helps identify outliers.
Dot Plot: Plots each data value as a dot above a number line. Useful for small to moderate data sets.

Example: For the data set [75, 49, 104, ...], a stem-and-leaf plot with stems 1–14 and leaves as single digits organizes the data and reveals that most values are between 20 and 80.

Graphing Qualitative Data Sets

Pie Chart: Circle divided into sectors representing categories. Area of each sector is proportional to frequency or percent.
Pareto Chart: Vertical bar graph with bars in decreasing order of frequency. Highlights most significant categories.

Type of Degree	Number (thousands)	Relative Frequency
Associate’s	1037	0.255
Bachelor’s	2013	0.494
Master’s	834	0.205
Doctoral	188	0.046

Graphing Paired Data Sets

Scatter Plot: Plots paired data as points in the coordinate plane to show relationships between two quantitative variables.
Time Series Chart: Plots data collected at regular intervals over time, connecting points with line segments.

Example: A scatter plot of petal length vs. petal width shows that as petal length increases, petal width tends to increase.

2.3 Measures of Central Tendency

Mean, Median, and Mode

Measures of central tendency describe the center of a data set.

Mean: The arithmetic average. For a population: ; for a sample:
Median: The middle value when data are ordered. If even number of entries, median is the mean of the two middle values.
Mode: The value(s) that occur most frequently. Data can be unimodal, bimodal, or have no mode.

Outlier: A data entry far removed from other entries, which can significantly affect the mean.

Weighted Mean and Mean of Grouped Data

Weighted Mean: , where is the weight for each value .
Mean of Grouped Data: , where is the class midpoint and is the class frequency.

The Shapes of Distributions

Symmetric: Both sides of the distribution are mirror images. Mean = Median = Mode.
Uniform: All classes have approximately equal frequencies.
Skewed Left: Tail extends to the left. Mean < Median < Mode.
Skewed Right: Tail extends to the right. Mean > Median > Mode.

Note: The mean is pulled in the direction of skewness.

2.4 Measures of Variation

Range

Range:

Variance and Standard Deviation

Deviation: (population) or (sample)
Population Variance:
Population Standard Deviation:
Sample Variance:
Sample Standard Deviation:

Interpretation: Standard deviation measures the typical distance of data entries from the mean. Larger standard deviation indicates more spread.

Empirical Rule (68–95–99.7 Rule)

For bell-shaped (normal) distributions:
About 68% of data within 1 standard deviation of the mean.
About 95% within 2 standard deviations.
About 99.7% within 3 standard deviations.

Chebychev’s Theorem

For any data set, at least of the data lie within standard deviations of the mean ().
For , at least 75% of data within 2 standard deviations.
For , at least 88.9% within 3 standard deviations.

Standard Deviation for Grouped Data

For frequency distributions, use class midpoints () and frequencies ():

Coefficient of Variation (CV)

Compares variation between data sets with different units or means.
Population:
Sample:

Example: If the mean height of a basketball team is 74.2 inches with a standard deviation of 3.3 inches, .

Summary Table: Key Formulas

Measure	Population Formula	Sample Formula
Mean
Variance
Standard Deviation
Coefficient of Variation

Key Points

Frequency distributions and their graphs (histograms, polygons, ogives) organize and visualize data.
Measures of central tendency (mean, median, mode) summarize the center of a data set.
Measures of variation (range, variance, standard deviation, coefficient of variation) describe the spread of data.
Empirical Rule and Chebychev’s Theorem help interpret standard deviation in context.
Graphical and numerical summaries together provide a comprehensive understanding of data sets.

Additional info: This summary includes expanded explanations, formulas, and examples for clarity and completeness, as would be expected in a mini-textbook for college statistics students.