Skip to main content
Back

Statistics Study Guide: Variables, Data Displays, Outliers, and Correlation

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Understanding Variables in Statistics

Types of Variables

In statistics, variables are characteristics or properties that can take on different values. They are classified as either qualitative (categorical) or quantitative (numerical):

  • Qualitative Variables: Describe qualities or categories. Examples: zip code, living with parents, employment status, fraternity/sorority membership.

  • Quantitative Variables: Represent measurable quantities. Examples: annual income, undergraduate GPA.

Note: Some variables, like zip code, are coded numerically but are still qualitative because the numbers do not represent meaningful quantities.

Graphical Displays and Summary Statistics

Histograms and Boxplots

Histograms and boxplots are graphical tools used to summarize and visualize the distribution of quantitative data.

  • Histogram: Shows the frequency of data within specified intervals (bins).

  • Boxplot: Summarizes data using the five-number summary: minimum, Q1, median, Q3, and maximum. Outliers may be indicated as points beyond the whiskers.

Measures of Center: Mean and Median

  • Mean: The arithmetic average of a data set.

  • Median: The middle value when data are ordered from smallest to largest.

  • Skewness: If the mean is greater than the median, the distribution is right-skewed (positively skewed). If the mean is less than the median, the distribution is left-skewed (negatively skewed).

Example: Interpreting a Histogram

Given a histogram of calcium concentrations in water, you can estimate the percentage of locations within a certain range by summing the frequencies in the relevant bins and dividing by the total number of observations.

Calculating Percentages from Histograms

To find the percentage of observations in a certain range:

  • Add the frequencies for all bins in the range.

  • Divide by the total number of observations.

  • Multiply by 100 to get a percentage.

Example: If 65 out of 105 patients had fewer than 180 days of depression, the percentage is:

Frequency Tables and the Median

Using Frequency Tables

Frequency tables summarize data by grouping values into intervals and counting occurrences.

Number of calls made

Frequency

1 – 4

16

5 – 8

11

9 – 12

5

13 – 16

3

17 – 20

2

  • To find the number of people making more than 8 calls: sum frequencies for intervals above 8.

  • To find the median interval, determine the position of the median (middle value) and see which interval contains it.

Outliers and the 1.5*IQR Criterion

Identifying Outliers

An outlier is a value that lies far outside the range of the rest of the data. The 1.5*IQR rule is commonly used:

  • IQR (Interquartile Range):

  • Lower Fence:

  • Upper Fence:

  • Values outside these fences are considered outliers.

Example: If , , then . The upper fence is . Any value above 67.5 is an outlier.

Scatterplots, Correlation, and Regression

Scatterplots and Correlation

A scatterplot displays the relationship between two quantitative variables. The correlation coefficient () measures the strength and direction of a linear relationship:

  • ranges from -1 (perfect negative) to +1 (perfect positive).

  • The sign indicates direction; the magnitude indicates strength.

  • is unitless and does not change with changes in measurement units.

Linear Regression

Linear regression models the relationship between an explanatory variable () and a response variable () using the equation:

  • Slope (): Change in for a one-unit increase in .

  • Intercept (): Predicted value of when .

Example: If , the slope is -491, indicating a negative relationship.

Making Predictions

To predict for a given , substitute $ X $ into the regression equation.

Example: For ,

Interpreting Correlation in Context

  • A moderate to strong negative correlation (e.g., ) indicates that as increases, tends to decrease.

  • Correlation does not imply causation.

Summary Table: Key Statistical Concepts

Concept

Definition

Example

Qualitative Variable

Describes a category or quality

Zip code, Employment status

Quantitative Variable

Describes a measurable quantity

Annual income, GPA

Mean

Arithmetic average

Sum of values / Number of values

Median

Middle value in ordered data

Value at position

Outlier

Value outside or

Value above 67.5 if ,

Correlation ()

Strength and direction of linear relationship

(moderate positive)

Regression Slope

Change in per unit change in

Additional info:

  • When interpreting histograms, always check the scale and bin widths.

  • Boxplots are useful for comparing distributions and identifying outliers.

  • Correlation is only appropriate for linear relationships between quantitative variables.

Pearson Logo

Study Prep