Statistics Practice Problems: Variables, Sampling, Data Analysis, and Graphs

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Chapter 1: Types of Variables and Sampling Methods

Qualitative vs. Quantitative Variables

Variables in statistics are classified based on the type of data they represent. Understanding the distinction between qualitative and quantitative variables is fundamental for data analysis.

Qualitative Variables: Describe qualities or categories; non-numeric. Examples: shirt numbers (as labels), favorite sport.
Quantitative Variables: Represent measurable quantities; numeric. Examples: temperature, weight, number of bottles.

Example: The numbers on the shirts of a football team are qualitative if used as labels, while the temperature of coffee is quantitative.

Discrete vs. Continuous Quantitative Variables

Quantitative variables can be further classified as discrete or continuous, depending on the nature of their possible values.

Discrete Variables: Take on countable values (often integers). Example: number of bottles of juice.
Continuous Variables: Can take any value within a range, including fractions and decimals. Example: weight of a player.

Example: The number of bottles is discrete; the weight of a player is continuous.

Levels of Measurement

Variables are measured at different levels, which determine the types of statistical analyses that are appropriate.

Nominal: Categories without order (e.g., favorite sport).
Ordinal: Categories with a meaningful order (e.g., medal received: gold, silver, bronze).
Interval: Numeric scales without a true zero (e.g., temperature in Celsius).
Ratio: Numeric scales with a true zero (e.g., weight capacity of a backpack).

Sampling Techniques

Sampling is the process of selecting a subset of individuals from a population to estimate characteristics of the whole population.

Simple Random Sampling: Every member has an equal chance of being selected.
Systematic Sampling: Every nth member is selected (e.g., every fifth adult at an airport).
Stratified Sampling: Population divided into subgroups (strata), and random samples taken from each.
Cluster Sampling: Population divided into clusters, some clusters are randomly selected, and all members of chosen clusters are included.
Convenience Sampling: Samples are taken from a group that is easy to access.

Example: Interviewing everyone in an apartment building is cluster sampling; picking names from a bag is simple random sampling.

Experimental Design

Experiments are designed to test hypotheses under controlled conditions. Blinding is used to reduce bias.

Single-Blind Experiment: The experimental unit does not know which treatment is received.
Double-Blind Experiment: Neither the experimental unit nor the researcher knows which treatment is received.
Randomized Block Design: Experimental units are grouped into blocks, and treatments are randomly assigned within each block.
Matched-Pairs Design: Pairs of similar units are given different treatments.

Chapter 2: Frequency Distributions and Graphs

Frequency Distribution

A frequency distribution is a table that displays the number of occurrences of each value or category in a dataset.

Frequency: The count of how often a value appears.
Relative Frequency: The proportion of the total represented by each value.

Example: Preschool children’s favorite colors can be summarized in a frequency table and a pie chart.

Sample Frequency Distribution Table

Color	Frequency	Relative Frequency
Purple	3	0.12
Green	4	0.16
Blue	5	0.20
Red	4	0.16
Yellow	2	0.08
Total	25	1.00
Additional info: Frequencies inferred for illustration.

Histograms

A histogram is a graphical representation of the distribution of numerical data, typically using bars to show frequency.

Each bar represents the frequency of scores within a range.
Useful for visualizing the shape of the data distribution.

Example: Scores of community service projects can be plotted as a frequency histogram.

Chapter 3: Measures of Central Tendency and Dispersion

Mean, Median, and Mode

Measures of central tendency summarize a dataset with a single value.

Mean: The arithmetic average.
Median: The middle value when data are ordered.
Mode: The value that appears most frequently.

Example: For annual profits, the mean and median can be calculated to determine which best represents the data, especially if the data are skewed.

Standard Deviation, Variance, and Range

Measures of dispersion describe the spread of data.

Range: Difference between the highest and lowest values.
Variance: Average squared deviation from the mean.
Standard Deviation: Square root of variance.

Empirical Rule and Bell-Shaped Distributions

The Empirical Rule applies to bell-shaped (normal) distributions and describes how data are distributed around the mean.

Approximately 68% of data fall within 1 standard deviation of the mean.
Approximately 95% within 2 standard deviations.
Approximately 99.7% within 3 standard deviations.

Example: If the mean monthly utility bill is $132 and the standard deviation is $10, about 68% of bills are between $122 and $142.

z-Score

The z-score measures how many standard deviations a value is from the mean.

Formula:
Used to compare values from different distributions.

Example: If the average TV advertising time is 13 minutes (SD = 2.2), and a show has 16 minutes, .

Five-Number Summary and Boxplots

The five-number summary provides a quick overview of the distribution of a dataset.

Minimum
First Quartile (Q1)
Median (Q2)
Third Quartile (Q3)
Maximum

Boxplots graphically display the five-number summary and help identify outliers.

Sample Five-Number Summary Table

Statistic	Value
Minimum	154
Q1	189
Median	205
Q3	238
Maximum	275
Additional info: Quartiles inferred for illustration.

Applications and Problem Solving

Calculate averages and costs (e.g., average cost of pens).
Construct and interpret frequency tables, histograms, and boxplots.
Apply the Empirical Rule to estimate proportions in normal distributions.
Compute z-scores to compare individual data points to the mean.

Additional info: Some tables and values have been inferred for completeness and clarity.