Skip to main content
Back

Statistics Study Guide: Variables, Data Displays, Probability, and Regression

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Variables in Statistics

Types of Variables

In statistics, variables are characteristics or properties that can take on different values among subjects in a study. They are classified as quantitative (numerical) or qualitative (categorical).

  • Quantitative Variables: Variables that are measured numerically and can be used in arithmetic operations. Examples: annual income, undergraduate GPA, zip code (if treated as a number).

  • Qualitative Variables: Variables that describe qualities or categories. Examples: employment status, living with parents, security/fraternity membership.

Example: In a survey of college graduates, annual income and undergraduate GPA are quantitative, while employment status and living with parents are qualitative.

Data Displays: Histograms and Boxplots

Histograms

A histogram is a graphical representation of the distribution of a quantitative variable. It shows the frequency of data within specified intervals (bins).

  • Shape: Can be symmetric, skewed left, or skewed right.

  • Median and Mean: The position of the mean and median can indicate skewness. In a right-skewed distribution, the mean is greater than the median.

Example: A histogram of calcium concentration in water shows most locations have concentrations below 100 ppm, with a few high values causing right skewness.

Boxplots

Boxplots (box-and-whisker plots) summarize data using five-number summaries: minimum, first quartile (Q1), median, third quartile (Q3), and maximum.

  • Outliers: Values that fall outside the "fences" (calculated using the IQR) are considered outliers.

  • Comparisons: Side-by-side boxplots can compare distributions between groups (e.g., actors vs. actresses).

Example: The maximum age of actors (76) may be an outlier if it exceeds the upper fence.

Summary Statistics Table Example

Statistic

Value

Median

120.6

Range

478.8

Min

34.2

Max

513.5

Q1

65.4

Q3

205.2

Measures of Central Tendency

Mean and Median

The mean is the arithmetic average, while the median is the middle value when data are ordered. The relationship between mean and median helps identify skewness:

  • Mean > Median: Right-skewed distribution

  • Mean < Median: Left-skewed distribution

  • Mean ≈ Median: Symmetric distribution

Formula for Mean:

Interquartile Range (IQR) and Outliers

Calculating IQR and Fences

The interquartile range (IQR) is the difference between the third quartile (Q3) and the first quartile (Q1):

Outliers are identified using fences:

  • Lower Fence:

  • Upper Fence:

Values outside these fences are considered outliers.

Probability and Random Variables

Basic Probability

Probability quantifies the likelihood of an event occurring. For a fair six-sided die:

  • Probability of rolling a 5:

  • Probability of rolling a 5 then a 6:

Discrete Probability Distributions

A discrete random variable takes on a countable number of values. Its probability distribution lists the probabilities for each possible value.

Y

0

1

2

3

4

P(Y=y)

0.05

0.05

0.10

0.75

0.05

Mean of a Discrete Random Variable:

Regression and Correlation

Linear Regression

Linear regression models the relationship between two quantitative variables using a straight line:

  • Slope (m): Indicates the change in the response variable for a one-unit change in the explanatory variable.

  • Intercept (b): The value of y when x = 0.

Example: For the regression equation , the slope is -491.

Correlation Coefficient (r)

The correlation coefficient measures the strength and direction of the linear relationship between two variables:

  • Range: -1 to 1

  • Sign: Positive for direct relationship, negative for inverse relationship

  • Magnitude: Closer to 1 or -1 indicates stronger relationship

Interpretation: indicates a moderate negative linear relationship.

Applications and Examples

Using Histograms and Tables

  • Estimate sample size by summing frequencies in a histogram or table.

  • Find the median interval by identifying where the cumulative frequency reaches half the sample size.

Probability Table Example

Number of calls made

Frequency

1 - 4

16

5 - 8

11

9 - 12

5

13 - 16

3

17 - 20

2

Example: To find how many people made more than 8 calls, sum frequencies for intervals above 8.

Summary of Key Concepts

  • Identify variable types: quantitative vs. qualitative

  • Interpret histograms and boxplots for data distribution and outliers

  • Calculate mean, median, and IQR

  • Apply probability rules for single and compound events

  • Use regression equations for prediction

  • Interpret correlation coefficients

Additional info: These study notes expand upon the original questions by providing definitions, formulas, and context for each statistical concept, ensuring a self-contained guide for exam preparation.

Pearson Logo

Study Prep