Skip to main content
Back

Measures of Center and Data Representation in Statistics

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Chapter 3: Analyzing and Representing Data with Measures of Center

Introduction

This study guide covers fundamental concepts in descriptive statistics, focusing on measures of center and graphical representation of data. It includes exercises on frequency distributions, dotplots, calculation of mean, median, and mode, and interpretation of histograms.

Frequency Distributions

Tabular Representation of Data

A frequency distribution is a table that displays the number of occurrences (frequency) of each value in a dataset. This helps summarize and visualize the data.

  • Value: The distinct data points in the dataset.

  • Frequency: The number of times each value appears.

Example: For the dataset 3, 4, 4, 4, 4.5, 5, 6, 7, 8, 11, the frequency distribution is:

Value

Frequency

3

1

4

4

4.5

1

5

1

6

1

7

1

8

1

11

1

Graphical Representation: Dotplots

Dotplot Construction

A dotplot is a simple way to visualize the frequency of data values. Each dot represents one observation. Dotplots are useful for small datasets and for identifying clusters, gaps, and outliers.

  • Place dots above each value on a number line according to its frequency.

  • Helps to quickly see the distribution shape.

Example: For the dataset above, the dotplot would have four dots above 4, and one dot above each of the other values.

Measures of Center

Definition and Calculation

Measures of center are statistical values that describe the central tendency of a dataset. The three main measures are mean, median, and mode.

1. Mean (Average)

  • Definition: The mean is the sum of all data values divided by the number of values.

  • Formula:

  • Example: For the dataset above, add all values and divide by the total number of values.

2. Median

  • Definition: The median is the middle value when the data are ordered from least to greatest.

  • Calculation: If the number of values (N) is odd, the median is the middle value. If N is even, the median is the average of the two middle values.

  • Formula for position:

  • Example: For 10 values, the median is the average of the 5th and 6th values in the ordered list.

3. Mode

  • Definition: The mode is the value that appears most frequently in the dataset.

  • Example: In the dataset above, 4 is the mode because it appears four times.

Distances from the Mean

Calculating Deviations

The distance from the mean (deviation) for each value is calculated as the difference between the value and the mean.

  • Negative distances: Values less than the mean.

  • Positive distances: Values greater than the mean.

  • The sum of all distances from the mean is always zero.

Formula:

Example: Calculate each value's deviation from the mean, sum the negative and positive deviations separately, and verify their total is zero.

Histogram Interpretation

Understanding Salary Distributions

A histogram is a graphical representation of the distribution of numerical data, where the data are grouped into ranges (bins). The height of each bar shows the frequency of data within each range.

  • Used to visualize the distribution of salaries for professional baseball players.

  • The mean is marked on the histogram.

  • Median can be estimated by finding the point where half the data lies below and half above.

Comparing Mean and Median

  • Mean: Sensitive to extreme values (outliers), may not represent the 'typical' value in skewed distributions.

  • Median: Less affected by outliers, better represents the center in skewed distributions.

Example: In salary data, the median may better represent a typical player's salary than the mean, which can be skewed by a few very high salaries.

Advantages of Mean and Median

  • Median: Resistant to outliers and skewed data; provides a better measure of center for non-symmetric distributions.

  • Mean: Uses all data values; useful for further statistical analysis and calculations.

Summary Table: Comparison of Measures of Center

Measure

Definition

Advantages

Disadvantages

Mean

Arithmetic average of all values

Uses all data; useful for calculations

Affected by outliers/skewed data

Median

Middle value in ordered data

Resistant to outliers/skewed data

Does not use all data values

Mode

Most frequent value

Easy to identify; useful for categorical data

May not be unique or may not exist

Additional info: The exercises also encourage students to practice graphical representation and interpretation, which are essential skills in introductory statistics.

Pearson Logo

Study Prep