Data Presentation and Descriptive Statistics: Study Notes

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Data Presentation and Descriptive Statistics

Introduction

Data presentation and descriptive statistics are foundational concepts in statistics, providing methods to organize, summarize, and describe data sets. Effective data presentation allows for clear communication of essential features, while descriptive statistics offer numerical summaries of data characteristics.

Data Presentation

Importance of Data Presentation

Raw data are typically unorganized numerical values.
Tabulating and graphing data summarize information, offering a quick overview of key features.
Presentations should be simple, self-explanatory, and include descriptive titles, labeled axes, and units.

Frequency Distribution Table

A frequency table displays the number of observations for each value or range of values.

Helps in organizing raw data into a more interpretable format.

Example:

Number of Visits	Frequency
1	2
2	4
3	6
4	3
5	1
Total	16

Grouped Frequency Distribution Table

Used for continuous data, grouping values into intervals (classes).

Birth Weight (kg)	Frequency
1.0 – 1.4	1
1.5 – 1.9	4
2.0 – 2.4	7
2.5 – 2.9	7
3.0 – 3.4	10
3.5 – 3.9	3
4.0 – 4.4	1
4.5 – 4.9	2

Class Limits and Boundaries

Class limits: The lowest and highest data values that can belong to a class. Same decimal places as data.
Class boundaries: Values that separate classes, typically halfway between class limits. One more decimal place, ending in 5.

Birth Weight (kg) Class Interval	Class Boundary	Frequency
1.0 – 1.4	0.95 – 1.45	1
1.5 – 1.9	1.45 – 1.95	4
2.0 – 2.4	1.95 – 2.45	7
2.5 – 2.9	2.45 – 2.95	7
3.0 – 3.4	2.95 – 3.45	10
3.5 – 3.9	3.45 – 3.95	3
4.0 – 4.4	3.95 – 4.45	1
4.5 – 4.9	4.45 – 4.95	2

Frequency Distribution with Cumulative Frequency

Class Interval	Class Boundary	Frequency	Cumulative Frequency	Cumulative Percentage
1.0 – 1.4	0.95 – 1.45	1	1	2.86
1.5 – 1.9	1.45 – 1.95	4	5	14.29
2.0 – 2.4	1.95 – 2.45	7	12	34.29
2.5 – 2.9	2.45 – 2.95	7	19	54.29
3.0 – 3.4	2.95 – 3.45	10	29	82.86
3.5 – 3.9	3.45 – 3.95	3	32	91.43
4.0 – 4.4	3.95 – 4.45	1	33	94.29
4.5 – 4.9	4.45 – 4.95	2	35	100.00

Ogive

An ogive is a graph that represents cumulative frequency or cumulative percentage against class boundaries. It is useful for determining medians, quartiles, and percentiles.

Histogram

Pictorial representation of a frequency table for continuous data using contiguous bars.
Frequencies are represented by the area (and height, if intervals are equal) of each bar.
The total area of all bars represents 100% of the frequency.

Example Calculation:

Total surface area of histogram: units2
Frequency for the 5th bar: units

Frequency Polygon

Line graph connecting the midpoints of class intervals at the heights corresponding to their frequencies.
Helps compare distributions and visualize the shape of the data.

Stem-and-Leaf Plot

Displays data to retain individual values while showing distribution shape.
Stem: all digits except the right-most digit; Leaf: right-most digit.
Useful for small to moderate-sized data sets.
Can be split into intervals or include cumulative frequency for median location.

Example:

Stem	Leaf
0	1 4 6 8 9
1	1 2 3 5 7 9 9
2	0 2 6
3	2

Stem unit: 1.0, Leaf unit: 0.1

Pie Chart

Used for categorical (mainly nominal) data.
Each sector's area is proportional to the frequency.
Common in presentations and published reports.

Bar Chart

Represents categories along the horizontal axis; bar heights show frequencies.
Bars are separated to emphasize discrete categories.
Vertical axis should start at zero to avoid misleading impressions.

Describing Data Distributions

Patterns of Data Distributions

Symmetrical (bell-shaped): Data is evenly distributed around the center.
Bimodal: Two peaks in the distribution.
Rectangular (Uniform): All intervals are equally represented.
Positively skewed (skewed right): Tail extends to higher values.
Negatively skewed (skewed left): Tail extends to lower values.

Descriptive Statistics

Measures of Central Tendency

Central tendency measures describe the center or typical value of a data set.

Mean: The arithmetic average. For a sample: ; for a population:
Median: The middle value in an ordered data set. For odd : ; for even :
Mode: The value that occurs most frequently. Data can be unimodal, bimodal, or have no mode.

Comparison Table: Mean, Median, Mode

	Advantages	Disadvantages
Mean	Uses all data; suitable for further analysis	Affected by outliers
Median	Not affected by outliers; easy to understand	Requires ordering data; less robust for further analysis
Mode	Simple; not affected by outliers	Not always unique; not useful for further analysis

Measures of Variability (Dispersion)

Variability measures describe the spread or dispersion of data values.

Range: Difference between the highest and lowest values.
Standard Deviation (SD): Measures average deviation from the mean. For a population: ; for a sample:
Percentile: Indicates the value below which a given percentage of observations fall. Quartiles (Q1, Q2, Q3) are special percentiles (25th, 50th, 75th).

Empirical Rule (for Symmetric Distributions)

~68% of data within 1 SD of mean
~95% within 2 SDs
~99% within 3 SDs

Example: Standard Deviation Calculation

Given data: 3.5, 4.2, 5.8, 7.1, 9.6, 12.3
Mean:
Sample SD:

Percentiles and Interquartile Range (IQR)

Percentile: The value below which a certain percent of observations fall.
Quartiles: Q1 (25th percentile), Q2 (median, 50th), Q3 (75th percentile).
Interquartile Range (IQR): ; measures the spread of the middle 50% of data.

Example:

Data: 3.5, 3.5, 3.6, 3.7, 4.0, 4.1, 4.3, 4.5, 4.6, 4.7, 4.8, 5.2, 5.7, 6.1, 6.3, 6.5
Median = 4.55, Q1 = 3.85, Q3 = 5.45, IQR = 1.6

Choosing the Appropriate Measure

For symmetric distributions: use mean and standard deviation.
For skewed distributions: use median and percentiles.

Summary Table: When to Use Mean/Median and SD/Percentile

Distribution Type	Central Tendency	Dispersion
Symmetric	Mean	Standard Deviation
Skewed	Median	Percentile/IQR

Additional info: These notes cover the foundational aspects of data presentation and descriptive statistics, including graphical and tabular methods, and the main measures of central tendency and variability. They are suitable for introductory statistics courses and exam preparation.