BackDescriptive and Inferential Statistics: Data Summarization and Measures
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Descriptive Statistics
Introduction to Descriptive Statistics
Descriptive statistics involve summarizing and describing the main features of a data set. This is typically achieved through graphical methods and numerical measures.
Graphical methods: Bar graphs, pie charts, histograms, stem-and-leaf displays, dot plots, etc.
Numerical methods: Means (averages), medians, variances, standard deviations, etc.
Inferential Statistics
Introduction to Inferential Statistics
Inferential statistics use data from a sample to make generalizations or predictions about a population. This involves estimation, hypothesis testing, and calculation of margins of error.
Example: Estimating the average (mean) annual family income from a sample.
Example: The survey unemployment rate is an inferential statistic.
Five Elements of Inferential Statistics
Population: The entire set of units (e.g., people, objects, events, etc.)
Example: All families in BC.
Variable of Interest: A characteristic measured on population units.
Example: Annual family income.
Sample: A subset of the population units, selected to represent the population.
Statistical Inference: An estimate or prediction about a population based on sample data.
Example: Using the sample proportion to estimate the population proportion.
Measure of Reliability: A statement about the uncertainty of a statistical inference (e.g., margin of error).
Types of Data
Qualitative (Categorical) Data
Qualitative data are non-numeric and classify items into categories.
Example: Eye color, blood type.
Quantitative Data
Quantitative data are numeric and can be measured on a scale.
Example: Height (cm), temperature (°C).
Graphical Methods for Describing Data
Bar Graphs and Pie Charts
Bar graphs and pie charts are used to display categorical data.
Marital Status | Canada (in millions) | US (in millions) |
|---|---|---|
Single | 13.3 | 71.4 |
Married | 15.0 | 125.5 |
Widowed | 1.5 | 14.6 |
Divorced | 1.5 | 28.8 |
Histograms
Histograms display the distribution of quantitative data by grouping values into intervals (bins).
Stem-and-Leaf Displays
Stem-and-leaf displays split each data value into a "stem" and a "leaf" to show the distribution while preserving the actual data values.
Useful for small data sets (typically less than 100 observations).
Example: For the data set 41, 41, 115, 116, 118, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, a stem-and-leaf display can be constructed.
Dot Plots
Each observation is represented as a dot on a number line, useful for small data sets.
Numerical Descriptive Measures
Measures of Central Location
Central location measures indicate the "center" of a data set.
Sample Mean (Arithmetic Mean): The average value of a data set.
Formula:
Example: For data set {3, 2, 1},
Median: The middle value when data are ordered.
Less sensitive to outliers than the mean.
Example: For data set {3, 26, 4, 15, 0.8}, median is 4.
Mode: The value that occurs most frequently in the data set.
Example: For quiz scores {8, 6, 7, 8, 10, 9, 8, 5, 7}, mode is 8.
Measures of Variability (Dispersion)
Variability measures describe the spread or dispersion of a data set.
Sample Range: Largest value minus smallest value.
Sample Variance: Average squared deviation from the mean.
Formula:
Sample Standard Deviation: Square root of the variance.
Formula:
Measure | Population | Sample |
|---|---|---|
Mean | ||
Variance | ||
Standard Deviation |
The denominator in the sample variance formula is known as the degrees of freedom.
Interpreting the Standard Deviation
Empirical Rule (Rule of Thumb)
For data with a mound-shaped (bell-shaped) distribution:
Approximately 68% of data fall within 1 standard deviation of the mean:
Approximately 95% within 2 standard deviations:
Approximately 99.7% within 3 standard deviations:
For mound-shaped data, a rough approximation for the range is 4 times the sample standard deviation.
Chebyshev's Rule
Chebyshev's Rule applies to any data set, regardless of shape:
At least of the data fall within standard deviations of the mean for .
For , at least 75% of data fall within 2 standard deviations.
For , at least 89% of data fall within 3 standard deviations.
Percentiles and Quartiles
Percentiles divide the data into 100 equal parts; quartiles divide the data into four equal parts.
25th percentile = lower quartile
50th percentile = median
75th percentile = upper quartile
Z-scores
The z-score measures how many standard deviations a value is from the mean.
Formula: (sample) or (population)
Example: For , , ,
Interpretation:
Approximately 68% of data have z-scores between -1 and 1.
Approximately 95% between -2 and 2.
Approximately 99.7% between -3 and 3.
Box Plots
Box plots graphically display the distribution of a data set using five-number summary:
Minimum
Lower quartile ()
Median ()
Upper quartile ()
Maximum
Interquartile Range (IQR): ; covers the middle 50% of the data.
Interpretation:
The length of the box (IQR) can be used to compare variability.
If one whisker is longer, the distribution is skewed in that direction.
Outliers are extreme values outside the whiskers.
Additional info:
Some context and examples were inferred for clarity and completeness.
Definitions and formulas were expanded for academic rigor.