Skip to main content
Back

Descriptive Statistics: Foundations, Data Types, and Data Summarization

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Descriptive Statistics

Introduction to Descriptive Statistics

Descriptive statistics are essential tools in statistics that allow us to summarize, organize, and simplify large sets of data. They help us understand the behavior of individuals and groups by providing clear, concise representations of data.

  • Purpose: To summarize and describe the main features of a dataset.

  • Applications: Used in research, business, health sciences, and more to make data understandable and actionable.

Types of Statistics

Descriptive vs. Inferential Statistics

  • Descriptive Statistics: Summarize the data collected using graphs, averages, and tables.

  • Inferential Statistics: Allow inferences about a larger population based on a sample.

Types of Data and Scales of Measurement

Levels of Measurement

Understanding the type of data is crucial for selecting appropriate statistical methods.

  • Nominal Scale: Lowest level; numbers are used as labels. No numerical operations are possible (e.g., gender, ethnicity).

  • Ordinal Scale: Data are ranked or ordered, but differences between ranks are not meaningful (e.g., class rankings).

  • Interval Scale: Ordered data with equal intervals between values, but no true zero (e.g., temperature in Celsius).

  • Ratio Scale: Like interval, but with an absolute zero, allowing for meaningful ratios (e.g., weight, height).

Variables

  • Discrete: Can take only specific values (e.g., number of students).

  • Continuous: Can take any value within a range (e.g., height, weight).

  • Independent: Variable manipulated or categorized to observe its effect.

  • Dependent: Variable measured to assess the effect of the independent variable.

  • Confounding: Variable that may affect the relationship between independent and dependent variables.

Describing Data

Tables and Frequency Distributions

Tables organize data to reveal patterns and facilitate analysis.

  • Frequency Distribution: Shows how often each value occurs.

  • Ungrouped Data: Each observation is a single class (for small datasets).

  • Grouped Data: Observations are grouped into classes (for large datasets).

Example Frequency Distribution Table

Weight Interval

Frequency (f)

240-249

1

230-239

2

220-229

3

210-219

2

200-209

4

190-199

8

180-189

9

170-179

7

160-169

17

150-159

12

140-149

7

130-139

3

Key Terms in Frequency Distributions

  • Gaps between classes: Equal the smallest possible difference between scores.

  • Real limits: Midpoints of the gaps between adjacent classes.

  • Cumulative frequency: Sum of frequencies up to a given class.

  • Cumulative proportion: Cumulative frequency divided by total frequency.

  • Cumulative percent: Cumulative proportion multiplied by 100.

  • Relative frequency: Frequency of a class divided by total frequency.

Example: Cumulative and Relative Frequencies

Interval

f

Cumulative f

Cumulative Proportion

Cumulative Percent

Relative f

240-249

1

53

1.00

100

0.02

230-239

2

52

0.98

98

0.04

220-229

3

50

0.94

94

0.06

210-219

2

47

0.89

89

0.04

200-209

4

45

0.85

85

0.08

190-199

8

41

0.77

77

0.15

180-189

9

33

0.62

62

0.17

170-179

7

24

0.45

45

0.13

160-169

17

17

0.32

32

0.13

150-159

12

12

0.23

23

0.23

140-149

7

7

0.13

13

0.13

130-139

3

3

0.06

6

0.06

Describing Data with Graphs

Types of Graphs

  • Bar Graph: Used for categorical (nominal or ordinal) data. Bars are separated to show distinct categories.

  • Histogram: Used for interval or ratio data. Bars touch to indicate continuous data.

  • Line Graph (Frequency Polygon): Plots frequencies at the midpoint of each interval and connects them with lines.

  • Stem and Leaf Display: Shows individual data values while organizing them into groups.

Measures of Central Tendency

Definition and Calculation

  • Mean: The arithmetic average.

  • Median: The middle value when data are ordered. If odd number of scores, pick the middle; if even, average the two middle values.

  • Mode: The most frequently occurring value.

Choosing the Appropriate Measure

  • Nominal data: Mode

  • Ordinal data: Median

  • Interval/Ratio data: Mean (unless data are skewed)

Effect of Distribution Shape

  • In a normal distribution, mean = median = mode.

  • In a positively skewed distribution: mode < median < mean.

  • In a negatively skewed distribution: mean < median < mode.

Measures of Variability

Definition and Types

Variability reflects the amount by which scores are dispersed or scattered in a distribution.

  • Range: Difference between the largest and smallest score.

  • Variance: Mean of all squared deviations from the mean. Population variance: Sample variance: Where

  • Standard Deviation (SD): Square root of the variance.

  • Interquartile Range (IQR): Range of the middle 50% of scores. Not sensitive to outliers.

Degrees of Freedom

  • For sample variance, is used to provide an unbiased estimate of population variance. This is called the degrees of freedom (df).

Formulas for Sum of Squares (SS)

  • Definition Formula:

  • Computation Formula:

Example: Calculating IQR

  • Arrange data in order, find the 25th and 75th percentiles, and subtract the lower from the upper quartile.

  • For data: 21, 28, 28, 29, 31, 34, 34, 39, 40, 40, 44 IQR = 40 - 28 = 12

Summary Table: Measures of Central Tendency and Variability

Measure

Definition

Formula

Mean

Arithmetic average

Median

Middle value

Arrange data, find middle

Mode

Most frequent value

Count occurrences

Range

Max - Min

Variance

Mean squared deviation

,

Standard Deviation

Square root of variance

,

IQR

Interquartile Range

Additional info: These notes provide foundational knowledge for further study in statistics, including inferential methods and hypothesis testing.

Pearson Logo

Study Prep