BackData Presentation and Descriptive Statistics: Study Notes
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Data Presentation and Descriptive Statistics
Introduction
Data presentation and descriptive statistics are foundational concepts in statistics, providing methods to organize, summarize, and describe data sets. Effective data presentation allows for clear communication of essential features, while descriptive statistics offer numerical summaries of data characteristics.
Data Presentation
Importance of Data Presentation
Raw data are typically unorganized numerical values.
Tabulating and graphing data summarize information, offering a quick overview of key features.
Presentations should be simple, self-explanatory, and include descriptive titles, labeled axes, and units.
Frequency Distribution Table
A frequency table displays the number of observations for each value or range of values.
Helps in organizing raw data into a more interpretable format.
Example:
Number of Visits | Frequency |
|---|---|
1 | 2 |
2 | 4 |
3 | 6 |
4 | 3 |
5 | 1 |
Total | 16 |
Grouped Frequency Distribution Table
Used for continuous data, grouping values into intervals (classes).
Birth Weight (kg) | Frequency |
|---|---|
1.0 – 1.4 | 1 |
1.5 – 1.9 | 4 |
2.0 – 2.4 | 7 |
2.5 – 2.9 | 7 |
3.0 – 3.4 | 10 |
3.5 – 3.9 | 3 |
4.0 – 4.4 | 1 |
4.5 – 4.9 | 2 |
Class Limits and Boundaries
Class limits: The lowest and highest data values that can belong to a class. Same decimal places as data.
Class boundaries: Values that separate classes, typically halfway between class limits. One more decimal place, ending in 5.
Birth Weight (kg) Class Interval | Class Boundary | Frequency |
|---|---|---|
1.0 – 1.4 | 0.95 – 1.45 | 1 |
1.5 – 1.9 | 1.45 – 1.95 | 4 |
2.0 – 2.4 | 1.95 – 2.45 | 7 |
2.5 – 2.9 | 2.45 – 2.95 | 7 |
3.0 – 3.4 | 2.95 – 3.45 | 10 |
3.5 – 3.9 | 3.45 – 3.95 | 3 |
4.0 – 4.4 | 3.95 – 4.45 | 1 |
4.5 – 4.9 | 4.45 – 4.95 | 2 |
Frequency Distribution with Cumulative Frequency
Class Interval | Class Boundary | Frequency | Cumulative Frequency | Cumulative Percentage |
|---|---|---|---|---|
1.0 – 1.4 | 0.95 – 1.45 | 1 | 1 | 2.86 |
1.5 – 1.9 | 1.45 – 1.95 | 4 | 5 | 14.29 |
2.0 – 2.4 | 1.95 – 2.45 | 7 | 12 | 34.29 |
2.5 – 2.9 | 2.45 – 2.95 | 7 | 19 | 54.29 |
3.0 – 3.4 | 2.95 – 3.45 | 10 | 29 | 82.86 |
3.5 – 3.9 | 3.45 – 3.95 | 3 | 32 | 91.43 |
4.0 – 4.4 | 3.95 – 4.45 | 1 | 33 | 94.29 |
4.5 – 4.9 | 4.45 – 4.95 | 2 | 35 | 100.00 |
Ogive
An ogive is a graph that represents cumulative frequency or cumulative percentage against class boundaries. It is useful for determining medians, quartiles, and percentiles.
Histogram
Pictorial representation of a frequency table for continuous data using contiguous bars.
Frequencies are represented by the area (and height, if intervals are equal) of each bar.
The total area of all bars represents 100% of the frequency.
Example Calculation:
Total surface area of histogram: units2
Frequency for the 5th bar: units
Frequency Polygon
Line graph connecting the midpoints of class intervals at the heights corresponding to their frequencies.
Helps compare distributions and visualize the shape of the data.
Stem-and-Leaf Plot
Displays data to retain individual values while showing distribution shape.
Stem: all digits except the right-most digit; Leaf: right-most digit.
Useful for small to moderate-sized data sets.
Can be split into intervals or include cumulative frequency for median location.
Example:
Stem | Leaf |
|---|---|
0 | 1 4 6 8 9 |
1 | 1 2 3 5 7 9 9 |
2 | 0 2 6 |
3 | 2 |
Stem unit: 1.0, Leaf unit: 0.1
Pie Chart
Used for categorical (mainly nominal) data.
Each sector's area is proportional to the frequency.
Common in presentations and published reports.
Bar Chart
Represents categories along the horizontal axis; bar heights show frequencies.
Bars are separated to emphasize discrete categories.
Vertical axis should start at zero to avoid misleading impressions.
Describing Data Distributions
Patterns of Data Distributions
Symmetrical (bell-shaped): Data is evenly distributed around the center.
Bimodal: Two peaks in the distribution.
Rectangular (Uniform): All intervals are equally represented.
Positively skewed (skewed right): Tail extends to higher values.
Negatively skewed (skewed left): Tail extends to lower values.
Descriptive Statistics
Measures of Central Tendency
Central tendency measures describe the center or typical value of a data set.
Mean: The arithmetic average. For a sample: ; for a population:
Median: The middle value in an ordered data set. For odd : ; for even :
Mode: The value that occurs most frequently. Data can be unimodal, bimodal, or have no mode.
Comparison Table: Mean, Median, Mode
Advantages | Disadvantages | |
|---|---|---|
Mean | Uses all data; suitable for further analysis | Affected by outliers |
Median | Not affected by outliers; easy to understand | Requires ordering data; less robust for further analysis |
Mode | Simple; not affected by outliers | Not always unique; not useful for further analysis |
Measures of Variability (Dispersion)
Variability measures describe the spread or dispersion of data values.
Range: Difference between the highest and lowest values.
Standard Deviation (SD): Measures average deviation from the mean. For a population: ; for a sample:
Percentile: Indicates the value below which a given percentage of observations fall. Quartiles (Q1, Q2, Q3) are special percentiles (25th, 50th, 75th).
Empirical Rule (for Symmetric Distributions)
~68% of data within 1 SD of mean
~95% within 2 SDs
~99% within 3 SDs
Example: Standard Deviation Calculation
Given data: 3.5, 4.2, 5.8, 7.1, 9.6, 12.3
Mean:
Sample SD:
Percentiles and Interquartile Range (IQR)
Percentile: The value below which a certain percent of observations fall.
Quartiles: Q1 (25th percentile), Q2 (median, 50th), Q3 (75th percentile).
Interquartile Range (IQR): ; measures the spread of the middle 50% of data.
Example:
Data: 3.5, 3.5, 3.6, 3.7, 4.0, 4.1, 4.3, 4.5, 4.6, 4.7, 4.8, 5.2, 5.7, 6.1, 6.3, 6.5
Median = 4.55, Q1 = 3.85, Q3 = 5.45, IQR = 1.6
Choosing the Appropriate Measure
For symmetric distributions: use mean and standard deviation.
For skewed distributions: use median and percentiles.
Summary Table: When to Use Mean/Median and SD/Percentile
Distribution Type | Central Tendency | Dispersion |
|---|---|---|
Symmetric | Mean | Standard Deviation |
Skewed | Median | Percentile/IQR |
Additional info: These notes cover the foundational aspects of data presentation and descriptive statistics, including graphical and tabular methods, and the main measures of central tendency and variability. They are suitable for introductory statistics courses and exam preparation.