Skip to main content
Back

Measures of Center and Data Representation in Statistics

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Chapter 3: Analyzing and Representing Data with Measures

Measures of Center

Measures of center are statistical values that describe the central tendency of a data set. The three most common measures are the mean, median, and mode. These measures help summarize and interpret data distributions.

  • Mean (Average): The mean is the arithmetic average of a set of values. It is calculated by summing all the values and dividing by the number of values.

  • Median: The median is the middle value when the data are arranged in order. If there is an even number of values, the median is the average of the two middle values.

  • Mode: The mode is the value that appears most frequently in the data set.

Formulas:

  • Mean: or, for frequency tables:

Example: For the data set: 3, 4, 4, 4, 4, 5, 6, 7, 8, 11, 11

  • Mean: Add all values and divide by the number of values.

  • Median: Arrange values in order and find the middle value.

  • Mode: Identify the value that occurs most frequently (in this case, 4).

Frequency Distribution Table

A frequency distribution table organizes data values and their corresponding frequencies. This helps visualize how often each value occurs.

Value

Frequency

3

1

4

4

5

1

6

1

7

1

8

1

11

2

Dotplot Representation

A dotplot is a simple graphical display of data values using dots. Each dot represents one observation. Dotplots are useful for visualizing the distribution and identifying clusters, gaps, and outliers.

  • To create a dotplot, place a dot above each value on a number line for every occurrence of that value.

  • Dotplots help in visually identifying the mode and the spread of the data.

Calculating Distances from the Mean

The distance from the mean for each value is the difference between the value and the mean. Summing these distances provides insight into the data's spread and symmetry.

  • Sum of all negative distances: Add all values below the mean, subtracting the mean from each.

  • Sum of all positive distances: Add all values above the mean, subtracting the mean from each.

  • Total sum of all distances from the mean: This will always be zero, as the mean balances the data.

Formula:

  • Distance from mean for value :

  • Total sum:

Histogram Representation

A histogram is a graphical representation of the distribution of numerical data, where data are grouped into ranges (bins) and the frequency of each bin is shown as a bar.

  • Histograms are useful for visualizing the shape of the data distribution, such as skewness, modality, and spread.

  • In the example, the histogram shows the distribution of salaries for professional baseball players, with the mean marked.

Comparing Mean and Median

In skewed distributions, the mean and median can differ significantly. The median is often a better representative of the "typical" value when the data are skewed or contain outliers.

  • Advantage of the median: Not affected by extreme values (outliers) or skewed data; better represents the center in such cases.

  • Advantage of the mean: Uses all data values; useful for further statistical analysis and calculations.

Measure

Advantage

Disadvantage

Mean

Considers all data values; useful for mathematical analysis

Affected by outliers and skewed data

Median

Resistant to outliers; better for skewed distributions

Does not use all data values; less useful for further calculations

Example Application: In the salary histogram, the median is a better representative of the typical salary because the mean is pulled higher by a few very large salaries (right-skewed distribution).

Summary of Key Concepts

  • Mean, median, and mode are measures of center that summarize data distributions.

  • Frequency tables and dotplots help organize and visualize data.

  • Histograms show the shape and spread of data distributions.

  • Choosing the appropriate measure of center depends on the data's distribution and presence of outliers.

Additional info: The notes are based on introductory statistics concepts, focusing on measures of center and graphical data representation. The exercises reinforce understanding through calculation and interpretation.

Pearson Logo

Study Prep