Skip to main content
Back

Data Presentation and Descriptive Statistics: Study Notes

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Data Presentation and Descriptive Statistics

Introduction

Data presentation and descriptive statistics are foundational concepts in statistics, providing methods to organize, summarize, and describe data sets. Effective data presentation allows for clear communication of essential features, while descriptive statistics offer numerical summaries of data characteristics.

Data Presentation

Importance of Data Presentation

  • Raw data are typically unorganized numerical values.

  • Tabulating and graphing data summarize information, offering a quick overview of key features.

  • Presentations should be simple, self-explanatory, and include descriptive titles, labeled axes, and units.

Frequency Distribution Table

A frequency table displays the number of observations for each value or range of values.

  • Helps in organizing raw data into a more interpretable format.

Example:

Number of Visits

Frequency

1

2

2

4

3

6

4

3

5

1

Total

16

Grouped Frequency Distribution Table

Used for continuous data, grouping values into intervals (classes).

Birth Weight (kg)

Frequency

1.0 – 1.4

1

1.5 – 1.9

4

2.0 – 2.4

7

2.5 – 2.9

7

3.0 – 3.4

10

3.5 – 3.9

3

4.0 – 4.4

1

4.5 – 4.9

2

Class Limits and Boundaries

  • Class limits: The lowest and highest data values that can belong to a class. Same decimal places as data.

  • Class boundaries: Values that separate classes, typically halfway between class limits. One more decimal place, ending in 5.

Birth Weight (kg) Class Interval

Class Boundary

Frequency

1.0 – 1.4

0.95 – 1.45

1

1.5 – 1.9

1.45 – 1.95

4

2.0 – 2.4

1.95 – 2.45

7

2.5 – 2.9

2.45 – 2.95

7

3.0 – 3.4

2.95 – 3.45

10

3.5 – 3.9

3.45 – 3.95

3

4.0 – 4.4

3.95 – 4.45

1

4.5 – 4.9

4.45 – 4.95

2

Frequency Distribution with Cumulative Frequency

Class Interval

Class Boundary

Frequency

Cumulative Frequency

Cumulative Percentage

1.0 – 1.4

0.95 – 1.45

1

1

2.86

1.5 – 1.9

1.45 – 1.95

4

5

14.29

2.0 – 2.4

1.95 – 2.45

7

12

34.29

2.5 – 2.9

2.45 – 2.95

7

19

54.29

3.0 – 3.4

2.95 – 3.45

10

29

82.86

3.5 – 3.9

3.45 – 3.95

3

32

91.43

4.0 – 4.4

3.95 – 4.45

1

33

94.29

4.5 – 4.9

4.45 – 4.95

2

35

100.00

Ogive

An ogive is a graph that represents cumulative frequency or cumulative percentage against class boundaries. It is useful for determining medians, quartiles, and percentiles.

Histogram

  • Pictorial representation of a frequency table for continuous data using contiguous bars.

  • Frequencies are represented by the area (and height, if intervals are equal) of each bar.

  • The total area of all bars represents 100% of the frequency.

Example Calculation:

  • Total surface area of histogram: units2

  • Frequency for the 5th bar: units

Frequency Polygon

  • Line graph connecting the midpoints of class intervals at the heights corresponding to their frequencies.

  • Helps compare distributions and visualize the shape of the data.

Stem-and-Leaf Plot

  • Displays data to retain individual values while showing distribution shape.

  • Stem: all digits except the right-most digit; Leaf: right-most digit.

  • Useful for small to moderate-sized data sets.

  • Can be split into intervals or include cumulative frequency for median location.

Example:

Stem

Leaf

0

1 4 6 8 9

1

1 2 3 5 7 9 9

2

0 2 6

3

2

Stem unit: 1.0, Leaf unit: 0.1

Pie Chart

  • Used for categorical (mainly nominal) data.

  • Each sector's area is proportional to the frequency.

  • Common in presentations and published reports.

Bar Chart

  • Represents categories along the horizontal axis; bar heights show frequencies.

  • Bars are separated to emphasize discrete categories.

  • Vertical axis should start at zero to avoid misleading impressions.

Describing Data Distributions

Patterns of Data Distributions

  • Symmetrical (bell-shaped): Data is evenly distributed around the center.

  • Bimodal: Two peaks in the distribution.

  • Rectangular (Uniform): All intervals are equally represented.

  • Positively skewed (skewed right): Tail extends to higher values.

  • Negatively skewed (skewed left): Tail extends to lower values.

Descriptive Statistics

Measures of Central Tendency

Central tendency measures describe the center or typical value of a data set.

  • Mean: The arithmetic average. For a sample: ; for a population:

  • Median: The middle value in an ordered data set. For odd : ; for even :

  • Mode: The value that occurs most frequently. Data can be unimodal, bimodal, or have no mode.

Comparison Table: Mean, Median, Mode

Advantages

Disadvantages

Mean

Uses all data; suitable for further analysis

Affected by outliers

Median

Not affected by outliers; easy to understand

Requires ordering data; less robust for further analysis

Mode

Simple; not affected by outliers

Not always unique; not useful for further analysis

Measures of Variability (Dispersion)

Variability measures describe the spread or dispersion of data values.

  • Range: Difference between the highest and lowest values.

  • Standard Deviation (SD): Measures average deviation from the mean. For a population: ; for a sample:

  • Percentile: Indicates the value below which a given percentage of observations fall. Quartiles (Q1, Q2, Q3) are special percentiles (25th, 50th, 75th).

Empirical Rule (for Symmetric Distributions)

  • ~68% of data within 1 SD of mean

  • ~95% within 2 SDs

  • ~99% within 3 SDs

Example: Standard Deviation Calculation

  • Given data: 3.5, 4.2, 5.8, 7.1, 9.6, 12.3

  • Mean:

  • Sample SD:

Percentiles and Interquartile Range (IQR)

  • Percentile: The value below which a certain percent of observations fall.

  • Quartiles: Q1 (25th percentile), Q2 (median, 50th), Q3 (75th percentile).

  • Interquartile Range (IQR): ; measures the spread of the middle 50% of data.

Example:

  • Data: 3.5, 3.5, 3.6, 3.7, 4.0, 4.1, 4.3, 4.5, 4.6, 4.7, 4.8, 5.2, 5.7, 6.1, 6.3, 6.5

  • Median = 4.55, Q1 = 3.85, Q3 = 5.45, IQR = 1.6

Choosing the Appropriate Measure

  • For symmetric distributions: use mean and standard deviation.

  • For skewed distributions: use median and percentiles.

Summary Table: When to Use Mean/Median and SD/Percentile

Distribution Type

Central Tendency

Dispersion

Symmetric

Mean

Standard Deviation

Skewed

Median

Percentile/IQR

Additional info: These notes cover the foundational aspects of data presentation and descriptive statistics, including graphical and tabular methods, and the main measures of central tendency and variability. They are suitable for introductory statistics courses and exam preparation.

Pearson Logo

Study Prep