Descriptive and Inferential Statistics: Data Summarization and Measures

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Descriptive Statistics

Introduction to Descriptive Statistics

Descriptive statistics involve summarizing and describing the main features of a data set. This is typically achieved through graphical methods and numerical measures.

Graphical methods: Bar graphs, pie charts, histograms, stem-and-leaf displays, dot plots, etc.
Numerical methods: Means (averages), medians, variances, standard deviations, etc.

Inferential Statistics

Introduction to Inferential Statistics

Inferential statistics use data from a sample to make inferences about a population. This includes estimating population parameters and testing hypotheses.

Example: The average (mean) family income.
Example: The unemployment rate.

Five Elements of Inferential Statistics

Population: The entire set of units (e.g., people, objects, events, etc.)
- Example: People in the working class of BC.
- Example: Families in BC.
Variable of Interest: A characteristic of a population unit.
- Example: Employment status.
- Example: Annual family income.
Sample: A subset of the population units.
Sampling: The procedure for selecting the sample so that each unit has an equal chance of being selected.
Statistical Inference: An estimate, prediction, or generalization about a population based on sample information.
- Example: Use the sample proportion to estimate the population proportion.
Measure of Reliability: A statement about the degree of uncertainty associated with a statistical inference.
- Example: The unemployment rate is estimated as 5% ± 3%, 19 times out of 20. The margin of error is ±3%.

Types of Data

Qualitative (Categorical) Data

Qualitative data are non-numeric and can be classified into categories.

Example: Eye color of a person.
Example: Blood type of a patient.

Quantitative Data

Quantitative data are numeric and can be measured on a numerical scale with a unit.

Example: Height (in cm).
Example: Temperature (°C).

Chapter 2 — Methods for Describing Sets of Data

Bar Graphs and Pie Charts

Bar graphs and pie charts are used to display categorical data.

Marital Status	Canada (in millions)	US (in millions)
Single	13.3	71.4
Married	15.0	125.5
Widowed	1.5	14.6
Divorced	1.5	28.8

Pareto diagram: A bar graph with categories arranged by height in descending order.

Histograms

Histograms are used to display the distribution of quantitative data.

Each bar represents the frequency of data within a specific interval.

Stem-and-Leaf Displays (Stemplots)

Stem-and-leaf displays split each data value into a "stem" and a "leaf" to show the distribution of data.

Useful for small data sets.
Number of stems should be between 5 and 20.

Dot Plots

Each observation is represented as a dot on the graph, useful for small data sets.

Numerical Descriptive Measures

Measures of Central Location

Measures of central location describe the "center" of a data set.

Sample Mean (Arithmetic Mean): The average value of a data set.

Formula:

Median: The middle value when data are ordered.
Mode: The value that occurs most frequently in the data set.

Measures of Variability

Measures of variability describe the spread or dispersion of a data set.

Sample Range: Largest observation minus smallest observation.
Sample Variance: Average squared deviation from the mean.

Formula:

Sample Standard Deviation: Square root of the sample variance.

Formula:

Degrees of Freedom: The term in the denominator of the sample variance.

Interpreting the Standard Deviation

Empirical Rule (Rule of Thumb)

For mound-shaped (bell-shaped) distributions:

Approximately 68% of data fall within 1 standard deviation of the mean:
Approximately 95% within 2 standard deviations:
Approximately 99.7% within 3 standard deviations:

For mound-shaped data, a rough approximation for the range is sample range/4.

Chebyshev's Rule

Chebyshev's Rule applies to any data set, regardless of shape:

At least of the data fall within standard deviations of the mean for .

Percentiles and Quartiles

25th percentile = lower quartile
50th percentile = median
75th percentile = upper quartile

Z-scores

Definition and Interpretation

The z-score represents the distance between a value and the mean in terms of standard deviations.

Formula:

For population:

For mound-shaped distributions:

Approximately 68% of data have z-scores between -1 and 1.
Approximately 95% between -2 and 2.
Approximately 99.7% between -3 and 3.

Box Plots

Summary and Interpretation

5-number summary: Minimum, , Median, , Maximum.
Interquartile Range (IQR): ; covers the middle 50% of the data.
Whiskers: Extend from or to the most extreme measurement inside the inner fence.

Interpretation:

The length of the box (IQR) can be used to compare variability.
If one whisker is longer, the distribution is skewed in that direction.
Outliers are extreme measurements outside the inner fences.

Additional info:

These notes cover foundational concepts in descriptive and inferential statistics, including graphical and numerical methods for summarizing data, measures of central tendency and variability, and interpretation of statistical measures.
Examples and formulas are provided for key concepts to aid understanding and application.