Skip to main content
Back

Measures of Central Tendency and Variability in Biostatistics

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Measures of Central Tendency

Introduction to Central Tendency

Measures of central tendency are statistical tools used to describe the typical value in a dataset. They help summarize quantitative data with a single number, providing researchers and clinicians with a concise representation of the 'typical' subject or observation.

  • Key Measures: Mean, Median, Mode

  • Applications: Used to describe characteristics such as age, socioeconomic status, or general health in public health datasets.

Mean

The mean is the arithmetic average of a set of values. It is calculated by summing all values and dividing by the number of observations.

  • Formula:

  • Example: For the dataset (7, 4, 4, 5):

  • Properties: Sensitive to extreme values (outliers); not robust.

  • Application in R: x <- c(7,4,4,5); mean(x) returns 5.

Median

The median is the midpoint of a dataset, such that half the values are smaller and half are larger. It is less affected by outliers and skewed data.

  • Calculation: Arrange data in order and find the middle value. If the number of observations () is odd, the median is the middle value. If is even, the median is the mean of the two middle values.

  • Example: For (1, 4, 3, 2), sorted as (1, 2, 3, 4), the median is

  • Robustness: Median is robust to extreme values.

  • Application in R: x <- c(1,4,3,2); median(x) returns 2.5.

Mode

The mode is the value that occurs most frequently in a dataset. It can be used for both quantitative and categorical data.

  • Example: For ("M", "F", "M", "F", "F"), the mode is "F" (occurs three times).

  • Application in R: table(Sex) provides frequency counts for categorical variables.

  • Note: Mode is less commonly used for quantitative data but is useful for categorical data.

Comparing Mean and Median

When a dataset is skewed, the mean and median can differ significantly. In such cases, the median is preferred to describe the typical value.

  • Symmetric Distribution: Mean = Median

  • Left-Skewed: Mean < Median

  • Right-Skewed: Mean > Median

Measures of Variability

Introduction to Variability

Measures of variability describe the spread or dispersion of data. They complement measures of central tendency by indicating how much the data varies.

  • Key Measures: Range, Standard Deviation, Variance, Interquartile Range (IQR)

  • Example: Two patients with the same mean systolic blood pressure (SBP) may have different variability, affecting clinical decisions.

Range

The range is the difference between the largest and smallest values in a dataset.

  • Formula:

  • Example: For (7, 4, 4, 5):

  • Application in R: max(x) - min(x)

Standard Deviation and Variance

Standard deviation measures the average distance of each data point from the mean. Variance is the square of the standard deviation.

  • Variance Formula:

  • Standard Deviation Formula:

  • Properties: Standard deviation is widely used in biostatistics; a value of 0 indicates no variability.

  • Application in R: sd(y) for a vector y.

  • Note: Variance is not robust to outliers.

Quartiles and Interquartile Range (IQR)

Quartiles divide ordered data into four equal parts. The first quartile (Q1) is the value below which 25% of the data fall, and the third quartile (Q3) is the value below which 75% of the data fall.

  • IQR Formula:

  • Application: IQR is a robust measure of variability, less affected by outliers.

Boxplot and Outliers

A boxplot is a graphical summary displaying the minimum, Q1, median, Q3, and maximum. Outliers are typically defined as values less than or greater than .

  • Interpretation: Outliers should be investigated, as they may indicate errors or true extreme values.

Summary Table: Measures of Central Tendency and Variability

Measure

Definition

Formula

Robustness

Mean

Arithmetic average

Not robust

Median

Middle value

Middle of ordered data

Robust

Mode

Most frequent value

N/A

Robust

Range

Max - Min

Not robust

Standard Deviation

Average deviation from mean

Not robust

IQR

Middle 50% spread

Robust

Additional info:

  • Examples and R code snippets are provided to illustrate calculations.

  • Boxplots and histograms are useful for visualizing distributions and identifying outliers.

  • In biostatistics, understanding both central tendency and variability is crucial for interpreting health data and making informed decisions.

Pearson Logo

Study Prep