Descriptive and Inferential Statistics: Data Summarization and Measures

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Descriptive Statistics

Introduction to Descriptive Statistics

Descriptive statistics involve summarizing and describing the main features of a data set. This is typically achieved through graphical methods and numerical measures.

Graphical methods: Bar graphs, pie charts, histograms, stem-and-leaf displays, dot plots, etc.
Numerical methods: Means (averages), medians, variances, standard deviations, etc.

Inferential Statistics

Introduction to Inferential Statistics

Inferential statistics use data from a sample to make generalizations or predictions about a population. This involves estimation, hypothesis testing, and calculation of margins of error.

Example: Estimating the average (mean) annual family income from a sample.
Example: The survey unemployment rate is an inferential statistic.

Five Elements of Inferential Statistics

Population: The entire set of units (e.g., people, objects, events, etc.)
- Example: All families in BC.
Variable of Interest: A characteristic measured on population units.
- Example: Annual family income.
Sample: A subset of the population units, selected to represent the population.
Statistical Inference: An estimate or prediction about a population based on sample data.
- Example: Using the sample proportion to estimate the population proportion.
Measure of Reliability: A statement about the uncertainty of a statistical inference (e.g., margin of error).

Types of Data

Qualitative (Categorical) Data

Qualitative data are non-numeric and classify items into categories.

Example: Eye color, blood type.

Quantitative Data

Quantitative data are numeric and can be measured on a scale.

Example: Height (cm), temperature (°C).

Graphical Methods for Describing Data

Bar Graphs and Pie Charts

Bar graphs and pie charts are used to display categorical data.

Marital Status	Canada (in millions)	US (in millions)
Single	13.3	71.4
Married	15.0	125.5
Widowed	1.5	14.6
Divorced	1.5	28.8

Histograms

Histograms display the distribution of quantitative data by grouping values into intervals (bins).

Stem-and-Leaf Displays

Stem-and-leaf displays split each data value into a "stem" and a "leaf" to show the distribution while preserving the actual data values.

Useful for small data sets (typically less than 100 observations).
Example: For the data set 41, 41, 115, 116, 118, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, a stem-and-leaf display can be constructed.

Dot Plots

Each observation is represented as a dot on a number line, useful for small data sets.

Numerical Descriptive Measures

Measures of Central Location

Central location measures indicate the "center" of a data set.

Sample Mean (Arithmetic Mean): The average value of a data set.
- Formula:
- Example: For data set {3, 2, 1},
Median: The middle value when data are ordered.
- Less sensitive to outliers than the mean.
- Example: For data set {3, 26, 4, 15, 0.8}, median is 4.
Mode: The value that occurs most frequently in the data set.
- Example: For quiz scores {8, 6, 7, 8, 10, 9, 8, 5, 7}, mode is 8.

Measures of Variability (Dispersion)

Variability measures describe the spread or dispersion of a data set.

Sample Range: Largest value minus smallest value.
Sample Variance: Average squared deviation from the mean.
- Formula:
Sample Standard Deviation: Square root of the variance.
- Formula:

Measure	Population	Sample
Mean
Variance
Standard Deviation

The denominator in the sample variance formula is known as the degrees of freedom.

Interpreting the Standard Deviation

Empirical Rule (Rule of Thumb)

For data with a mound-shaped (bell-shaped) distribution:

Approximately 68% of data fall within 1 standard deviation of the mean:
Approximately 95% within 2 standard deviations:
Approximately 99.7% within 3 standard deviations:

For mound-shaped data, a rough approximation for the range is 4 times the sample standard deviation.

Chebyshev's Rule

Chebyshev's Rule applies to any data set, regardless of shape:

At least of the data fall within standard deviations of the mean for .
For , at least 75% of data fall within 2 standard deviations.
For , at least 89% of data fall within 3 standard deviations.

Percentiles and Quartiles

Percentiles divide the data into 100 equal parts; quartiles divide the data into four equal parts.

25th percentile = lower quartile
50th percentile = median
75th percentile = upper quartile

Z-scores

The z-score measures how many standard deviations a value is from the mean.

Formula: (sample) or (population)
Example: For , , ,

Interpretation:

Approximately 68% of data have z-scores between -1 and 1.
Approximately 95% between -2 and 2.
Approximately 99.7% between -3 and 3.

Box Plots

Box plots graphically display the distribution of a data set using five-number summary:

Minimum
Lower quartile ()
Median ()
Upper quartile ()
Maximum

Interquartile Range (IQR): ; covers the middle 50% of the data.

Interpretation:

The length of the box (IQR) can be used to compare variability.
If one whisker is longer, the distribution is skewed in that direction.
Outliers are extreme values outside the whiskers.

Additional info:

Some context and examples were inferred for clarity and completeness.
Definitions and formulas were expanded for academic rigor.