BackDescriptive and Inferential Statistics: Data Summarization and Measures
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Descriptive Statistics
Introduction to Descriptive Statistics
Descriptive statistics involve summarizing and describing the main features of a data set. This is typically achieved through graphical methods and numerical measures.
Graphical methods: Bar graphs, pie charts, histograms, stem-and-leaf displays, dot plots, etc.
Numerical methods: Means (averages), medians, variances, standard deviations, etc.
Inferential Statistics
Introduction to Inferential Statistics
Inferential statistics use data from a sample to make inferences about a population. This includes estimating population parameters and testing hypotheses.
Example: The average (mean) family income.
Example: The unemployment rate.
Five Elements of Inferential Statistics
Population: The entire set of units (e.g., people, objects, events, etc.)
Example: People in the working class of BC.
Example: Families in BC.
Variable of Interest: A characteristic of a population unit.
Example: Employment status.
Example: Annual family income.
Sample: A subset of the population units.
Sampling: The procedure for selecting the sample so that each unit has an equal chance of being selected.
Statistical Inference: An estimate, prediction, or generalization about a population based on sample information.
Example: Use the sample proportion to estimate the population proportion.
Measure of Reliability: A statement about the degree of uncertainty associated with a statistical inference.
Example: The unemployment rate is estimated as 5% ± 3%, 19 times out of 20. The margin of error is ±3%.
Types of Data
Qualitative (Categorical) Data
Qualitative data are non-numeric and can be classified into categories.
Example: Eye color of a person.
Example: Blood type of a patient.
Quantitative Data
Quantitative data are numeric and can be measured on a numerical scale with a unit.
Example: Height (in cm).
Example: Temperature (°C).
Chapter 2 — Methods for Describing Sets of Data
Bar Graphs and Pie Charts
Bar graphs and pie charts are used to display categorical data.
Marital Status | Canada (in millions) | US (in millions) |
|---|---|---|
Single | 13.3 | 71.4 |
Married | 15.0 | 125.5 |
Widowed | 1.5 | 14.6 |
Divorced | 1.5 | 28.8 |
Pareto diagram: A bar graph with categories arranged by height in descending order.
Histograms
Histograms are used to display the distribution of quantitative data.
Each bar represents the frequency of data within a specific interval.
Stem-and-Leaf Displays (Stemplots)
Stem-and-leaf displays split each data value into a "stem" and a "leaf" to show the distribution of data.
Useful for small data sets.
Number of stems should be between 5 and 20.
Dot Plots
Each observation is represented as a dot on the graph, useful for small data sets.
Numerical Descriptive Measures
Measures of Central Location
Measures of central location describe the "center" of a data set.
Sample Mean (Arithmetic Mean): The average value of a data set.
Formula:
Median: The middle value when data are ordered.
Mode: The value that occurs most frequently in the data set.
Measures of Variability
Measures of variability describe the spread or dispersion of a data set.
Sample Range: Largest observation minus smallest observation.
Sample Variance: Average squared deviation from the mean.
Formula:
Sample Standard Deviation: Square root of the sample variance.
Formula:
Degrees of Freedom: The term in the denominator of the sample variance.
Interpreting the Standard Deviation
Empirical Rule (Rule of Thumb)
For mound-shaped (bell-shaped) distributions:
Approximately 68% of data fall within 1 standard deviation of the mean:
Approximately 95% within 2 standard deviations:
Approximately 99.7% within 3 standard deviations:
For mound-shaped data, a rough approximation for the range is sample range/4.
Chebyshev's Rule
Chebyshev's Rule applies to any data set, regardless of shape:
At least of the data fall within standard deviations of the mean for .
Percentiles and Quartiles
25th percentile = lower quartile
50th percentile = median
75th percentile = upper quartile
Z-scores
Definition and Interpretation
The z-score represents the distance between a value and the mean in terms of standard deviations.
Formula:
For population:
For mound-shaped distributions:
Approximately 68% of data have z-scores between -1 and 1.
Approximately 95% between -2 and 2.
Approximately 99.7% between -3 and 3.
Box Plots
Summary and Interpretation
5-number summary: Minimum, , Median, , Maximum.
Interquartile Range (IQR): ; covers the middle 50% of the data.
Whiskers: Extend from or to the most extreme measurement inside the inner fence.
Interpretation:
The length of the box (IQR) can be used to compare variability.
If one whisker is longer, the distribution is skewed in that direction.
Outliers are extreme measurements outside the inner fences.
Additional info:
These notes cover foundational concepts in descriptive and inferential statistics, including graphical and numerical methods for summarizing data, measures of central tendency and variability, and interpretation of statistical measures.
Examples and formulas are provided for key concepts to aid understanding and application.