Unit 1 Study Guide: Sampling Designs and Describing Data in Elementary Statistics

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Statistics Basics

Descriptive Statistics vs. Inferential Statistics

Statistics is the science of collecting, organizing, analyzing, and interpreting data. It is divided into two main branches: descriptive statistics and inferential statistics. Descriptive statistics summarize and organize data from a population or sample, while inferential statistics use sample data to make generalizations or predictions about a population.

Descriptive Statistics: Methods for organizing and summarizing information. Examples include the number of home runs in a season, average salary, or frequency of grades.
Inferential Statistics: Methods for drawing and measuring the reliability of conclusions about a population based on information obtained from a sample. For example, testing if the thickness of baseball laces affects home runs using a sample of games.

Baseball with red laces

Key Measures in Descriptive Statistics:

Frequency: The number of times a particular value occurs.
Position: Location compared to others (e.g., percentile).
Central Tendency: Mean, median, mode, and trimmed mean.
Variation: Range, interquartile range, variance, standard deviation.

Various statistical graphs and charts

Variables and Data Types

Qualitative vs. Quantitative Data

A variable is a characteristic that varies from one person or thing to another. Data can be classified as qualitative or quantitative:

Qualitative (Categorical) Data: Non-numerical, sorts into categories (e.g., hair color, model of car).
Quantitative Data: Numerical, can be discrete or continuous.
- Discrete: Possible values can be listed, usually counts (e.g., number of siblings).
- Continuous: Possible values form an interval (e.g., foot length, weight).

Organizing Qualitative Data

Frequency and Relative Frequency Tables

Qualitative data can be organized using frequency tables, bar charts, and pie charts. Frequency is the count of occurrences, while relative frequency is the proportion of occurrences relative to the total.

Bar Charts: Visual representation of frequencies for categories.
Pie Charts: Show proportions of categories as slices of a circle.

Bar chart showing frequency of grades Bar chart showing relative frequency of grades Pie chart representation

Misleading Graphs

Graphs can be misleading due to truncated axes or improper scaling. Always check the axes and scaling to ensure accurate interpretation.

Truncated Graphs: Chopped off bottom of the axis, exaggerating differences.
Improper Scaling: Visual exaggeration of differences by manipulating scale.

Misleading trash can graph Misleading food bar charts Misleading welfare vs. jobs graph

Organizing Quantitative Data

Stem-and-Leaf Diagrams and Dot Plots

Quantitative data can be organized using stem-and-leaf diagrams and dot plots, which help visualize the distribution and identify measures of central tendency.

Stem-and-Leaf Diagram: Splits data into stems (leading digits) and leaves (trailing digits).
Dot Plot: Dots represent individual data points along a number line.

Stem-and-leaf diagram for MPG Dot plot for MPG

Histograms

Histograms are used for quantitative data, grouping values into bins or classes. Guidelines for creating histograms:

Number of classes should be between 5 and 15.
Each observation must belong to one class.
Classes should have equal width when feasible.

Distribution Shapes

Common Distribution Shapes

Data distributions can take several shapes:

Uniform: All values have similar frequency.
Symmetric: Mirror image frequencies on both sides.
Positively Skewed (Right): Long tail to the right; mean is right of median.
Negatively Skewed (Left): Long tail to the left; mean is left of median.
Unimodal, Bimodal, Multimodal: One, two, or multiple peaks.

Right and left skewed distributions Multimodal histogram of NFL player weights

Sampling Designs

Simple Random Sampling

Simple random sampling ensures each possible sample of a given size is equally likely to be chosen. Random number generators and tables are commonly used tools.

Random Number Generator: Technology or tables to select random samples.
Sampling Bias: Avoid samples that are not representative of the population.

Calculator for random number generation Listing all possible samples Random number table sampling

Other Sampling Methods

Stratified Sampling: Divide population into groups (strata) and sample from each.
Cluster Sampling: Divide population into clusters and randomly select clusters.
Systematic Sampling: Select every k-th member from a list after a random start.

Measures of Center and Variation

Measures of Central Tendency

Measures of central tendency describe the center of a data set:

Sample Mean (\bar{x}):
Population Mean (\mu):
Median: Middle value when data is ordered.
Mode: Most frequent value.

Measures of Spread

Measures of spread describe variability in the data:

Range: Difference between maximum and minimum values.
Interquartile Range (IQR): Difference between the first and third quartiles.
Variance: Average squared deviation from the mean.
Standard Deviation: Square root of variance.

Sample Variance:

Sample Standard Deviation:

Population Variance:

Population Standard Deviation:

Empirical Rule (Normal Distribution)

For normally distributed data:

~68% of data within 1 standard deviation of the mean
~95% within 2 standard deviations
~99.7% within 3 standard deviations

Normal distribution with percentiles

Box Plots and Five-Number Summary

Box Plots

Box plots visually display the five-number summary: minimum, Q1, median, Q3, and maximum. The interquartile range (IQR) is used to identify outliers.

Outliers: Values outside 1.5*IQR from the box.
Extreme Outliers: Values outside 3*IQR from the box.

Box plot with five-number summary and IQR

Trimmed Mean

A trimmed mean is calculated by removing a certain percentage of the smallest and largest values before computing the mean. This is useful when outliers are present.

Summary Table: Types of Data and Graphs

Type of Data	Graphical Representation
Qualitative (Categorical)	Bar Chart, Pie Chart
Quantitative (Discrete)	Stem-and-Leaf, Dot Plot, Histogram
Quantitative (Continuous)	Histogram, Box Plot

Assignments and Practice

Complete MyLab HW #1 and #2.
Practice drawing bar charts, pie charts, histograms, stem-and-leaf diagrams, and dot plots.
Review common distribution shapes and examples for each.
Understand sampling methods and their impact on bias and variability.
Calculate measures of center and spread by hand and using technology.