Skip to main content
Back

Descriptive Statistics: Data Organization, Visualization, and Measures of Central Tendency & Variation

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Qualitative and Quantitative Data

Qualitative Data: Software Survey Example

Qualitative data refers to non-numeric information that describes categories or qualities. In the example, a survey of 35 businesses records the primary software used in their offices.

  • Definition: Qualitative data categorizes or describes attributes (e.g., software brand).

  • Frequency Table: Counts the number of occurrences for each category.

Software

Frequency

Microsoft

20

Aldus

3

WordPerfect

6

Lotus

6

  • Pareto Chart: A bar graph that displays frequencies in descending order to highlight the most common categories.

  • Application: Useful for identifying the most popular software among businesses.

Frequency Distributions and Histograms

Quantitative Data: Ages of Presidents at Inauguration

Quantitative data consists of numerical values. The ages of U.S. presidents at inauguration are grouped into classes to create a frequency distribution.

  • Frequency Distribution: Organizes data into intervals (classes) and counts the number of observations in each.

  • Class Boundaries: The lower and upper limits of each interval.

  • Cumulative Frequency: The running total of frequencies up to each class boundary.

Class (Age)

Frequency

Class Boundaries

Cumulative Frequency

42-44

6

41.5-44.5

6

45-48

11

44.5-48.5

17

49-53

17

48.5-53.5

34

54-58

8

53.5-58.5

42

59-63

3

58.5-63.5

45

64-68

3

63.5-68.5

48

69-83

3

68.5-83.5

51

  • Histogram: A bar graph representing the frequency distribution of quantitative data. Each bar's height corresponds to the frequency of the class interval.

  • Ogive: A line graph of cumulative frequency, useful for determining percentiles and medians.

Graph Analysis

  • Concentration: Data is concentrated mainly in three classes (44.5 to 65.5 years).

  • Shape: The histogram is unimodal (one peak), not symmetric, and not uniform.

  • Outliers: Two outliers are present; most data is normally distributed.

Measures of Central Tendency and Spread

Home Runs: Babe Ruth Example

Measures of central tendency and spread summarize data sets. The number of home runs Babe Ruth hit each year is analyzed using these measures.

  • Mean: The arithmetic average.

  • Median: The middle value when data is ordered.

  • Quartiles: Values that divide the data into four equal parts (Q1, Q2, Q3).

  • Percentiles: Indicate the relative standing of a value within the data set.

  • Box Plot: A graphical summary showing minimum, Q1, median, Q3, and maximum.

Statistic

Value

Mean

43.9

Median

46

Q1

35

Q3

54

Min

22

Max

60

  • Interpretation: The distribution is symmetric around the center; no significant difference between mean and median.

Estimating Mean and Modal Class from Grouped Data

Grouped Data: Age Classes Example

When data is grouped into intervals, the mean can be estimated using class midpoints and frequencies.

  • Estimated Mean Formula:

  • Modal Class: The class interval with the highest frequency.

Class (Age)

Frequency ()

Midpoint ()

48-52

12

50.0

600.0

53-57

20

55.0

1100.0

58-62

16

60.0

960.0

63-67

30

65.0

1950.0

68-72

25

70.0

1750.0

73-77

17

75.0

1275.0

Total

120

7545.0

  • Estimated Mean: years

  • Modal Class: 63-67 (highest frequency)

Measures of Variation: Coefficient of Variation

Comparing Product Consistency

The coefficient of variation (CVar) is a standardized measure of dispersion, useful for comparing variability between data sets with different units or means.

  • Formula:

  • Application: Used to compare the consistency of product weights between two companies.

Company

Mean ()

Standard Deviation ()

Coefficient of Variation

American

5.0 lb

0.2 lb

4%

Canadian

305 g

7 g

2.3%

  • Interpretation: The American company shows more variation in product weights since its coefficient of variation is larger.

Summary Table: Key Statistical Concepts

Concept

Definition

Formula

Mean

Arithmetic average

Median

Middle value

-

Mode

Most frequent value

-

Modal Class

Class with highest frequency

-

Coefficient of Variation

Relative measure of spread

Frequency Distribution

Table of counts per interval

-

Histogram

Bar graph of frequency distribution

-

Ogive

Line graph of cumulative frequency

-

Box Plot

Graphical summary of five-number summary

-

Additional info: Some explanations and table entries have been expanded for clarity and completeness. All formulas are provided in LaTeX format for academic reference.

Pearson Logo

Study Prep