BackDescriptive Statistics: Data Organization, Visualization, and Measures of Central Tendency & Variation
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Qualitative and Quantitative Data
Qualitative Data: Software Survey Example
Qualitative data refers to non-numeric information that describes categories or qualities. In the example, a survey of 35 businesses records the primary software used in their offices.
Definition: Qualitative data categorizes or describes attributes (e.g., software brand).
Frequency Table: Counts the number of occurrences for each category.
Software | Frequency |
|---|---|
Microsoft | 20 |
Aldus | 3 |
WordPerfect | 6 |
Lotus | 6 |
Pareto Chart: A bar graph that displays frequencies in descending order to highlight the most common categories.
Application: Useful for identifying the most popular software among businesses.
Frequency Distributions and Histograms
Quantitative Data: Ages of Presidents at Inauguration
Quantitative data consists of numerical values. The ages of U.S. presidents at inauguration are grouped into classes to create a frequency distribution.
Frequency Distribution: Organizes data into intervals (classes) and counts the number of observations in each.
Class Boundaries: The lower and upper limits of each interval.
Cumulative Frequency: The running total of frequencies up to each class boundary.
Class (Age) | Frequency | Class Boundaries | Cumulative Frequency |
|---|---|---|---|
42-44 | 6 | 41.5-44.5 | 6 |
45-48 | 11 | 44.5-48.5 | 17 |
49-53 | 17 | 48.5-53.5 | 34 |
54-58 | 8 | 53.5-58.5 | 42 |
59-63 | 3 | 58.5-63.5 | 45 |
64-68 | 3 | 63.5-68.5 | 48 |
69-83 | 3 | 68.5-83.5 | 51 |
Histogram: A bar graph representing the frequency distribution of quantitative data. Each bar's height corresponds to the frequency of the class interval.
Ogive: A line graph of cumulative frequency, useful for determining percentiles and medians.
Graph Analysis
Concentration: Data is concentrated mainly in three classes (44.5 to 65.5 years).
Shape: The histogram is unimodal (one peak), not symmetric, and not uniform.
Outliers: Two outliers are present; most data is normally distributed.
Measures of Central Tendency and Spread
Home Runs: Babe Ruth Example
Measures of central tendency and spread summarize data sets. The number of home runs Babe Ruth hit each year is analyzed using these measures.
Mean: The arithmetic average.
Median: The middle value when data is ordered.
Quartiles: Values that divide the data into four equal parts (Q1, Q2, Q3).
Percentiles: Indicate the relative standing of a value within the data set.
Box Plot: A graphical summary showing minimum, Q1, median, Q3, and maximum.
Statistic | Value |
|---|---|
Mean | 43.9 |
Median | 46 |
Q1 | 35 |
Q3 | 54 |
Min | 22 |
Max | 60 |
Interpretation: The distribution is symmetric around the center; no significant difference between mean and median.
Estimating Mean and Modal Class from Grouped Data
Grouped Data: Age Classes Example
When data is grouped into intervals, the mean can be estimated using class midpoints and frequencies.
Estimated Mean Formula:
Modal Class: The class interval with the highest frequency.
Class (Age) | Frequency () | Midpoint () | |
|---|---|---|---|
48-52 | 12 | 50.0 | 600.0 |
53-57 | 20 | 55.0 | 1100.0 |
58-62 | 16 | 60.0 | 960.0 |
63-67 | 30 | 65.0 | 1950.0 |
68-72 | 25 | 70.0 | 1750.0 |
73-77 | 17 | 75.0 | 1275.0 |
Total | 120 | 7545.0 |
Estimated Mean: years
Modal Class: 63-67 (highest frequency)
Measures of Variation: Coefficient of Variation
Comparing Product Consistency
The coefficient of variation (CVar) is a standardized measure of dispersion, useful for comparing variability between data sets with different units or means.
Formula:
Application: Used to compare the consistency of product weights between two companies.
Company | Mean () | Standard Deviation () | Coefficient of Variation |
|---|---|---|---|
American | 5.0 lb | 0.2 lb | 4% |
Canadian | 305 g | 7 g | 2.3% |
Interpretation: The American company shows more variation in product weights since its coefficient of variation is larger.
Summary Table: Key Statistical Concepts
Concept | Definition | Formula |
|---|---|---|
Mean | Arithmetic average | |
Median | Middle value | - |
Mode | Most frequent value | - |
Modal Class | Class with highest frequency | - |
Coefficient of Variation | Relative measure of spread | |
Frequency Distribution | Table of counts per interval | - |
Histogram | Bar graph of frequency distribution | - |
Ogive | Line graph of cumulative frequency | - |
Box Plot | Graphical summary of five-number summary | - |
Additional info: Some explanations and table entries have been expanded for clarity and completeness. All formulas are provided in LaTeX format for academic reference.