Skip to main content
Back

Statistics Study Guide: Correlation, Regression, Measures of Central Tendency, and Z-Scores

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Correlation and Regression

Understanding Correlation Coefficient

The correlation coefficient measures the strength and direction of a linear relationship between two quantitative variables. It is denoted by r and ranges from -1 to 1.

  • Definition: The correlation coefficient r is a statistic that quantifies how closely data points fit a straight line.

  • Interpretation: An r value close to 1 or -1 indicates a strong linear relationship, while a value near 0 suggests little to no linear relationship.

  • Line of Best Fit: The regression line is a straight line that best represents the data on a scatterplot. It can be described by the equation .

Example: If the correlation coefficient between serving size and saturated fat is 0.95, this suggests a strong positive linear relationship.

Regression Equation

The regression equation predicts the value of one variable based on another. In the context of predicting tablet computer prices from RAM:

  • General Form:

  • Example: If the regression equation has a slope of 85 and an intercept of -188, then .

  • Interpretation of Slope: The slope (m) represents the expected change in price for each additional GB of RAM.

  • Interpretation of Intercept: The intercept (b) represents the predicted price when RAM is 0 GB.

Additional info: Regression analysis is widely used for prediction and forecasting in statistics.

Scatterplots

A scatterplot is a graphical representation of the relationship between two quantitative variables.

  • Purpose: To visually assess the form, direction, and strength of a relationship.

  • Types of Data: Scatterplots are used for quantitative (numerical) data.

  • Patterns: Points that form a straight line indicate a linear relationship.

Example: A scatterplot of serving size (oz) vs. saturated fat (grams) can show whether larger servings tend to have more saturated fat.

Analyzing Tabular Data

Relationship Between Variables

Tables can be used to compare and analyze relationships between variables. For example, the table below shows serving size and saturated fat content:

Size (Oz.)

Fat (grams)

Size (Oz.)

Fat (grams)

9

8

11

14

12

10

14

15

13

11

16

18

13

15

13

17

Application: By plotting these values on a scatterplot, one can determine if there is a positive or negative relationship between serving size and fat content.

Critical Values and Statistical Significance

Critical Values for Correlation Coefficient

Critical values are used to determine if a correlation coefficient is statistically significant.

Number of Pairs of Data

Critical Value of r

6

0.811

7

0.754

8

0.707

9

0.666

10

0.632

11

0.602

12

0.576

  • If the calculated r is greater than the critical value, the correlation is statistically significant.

  • If r is less than the critical value, the evidence is insufficient to claim a linear relationship.

Example: For 10 pairs of data, the critical value is 0.632.

Measures of Central Tendency and Spread

Mean, Median, and Mode

Measures of central tendency summarize a set of data with a single value.

  • Mean: The average of all data points.

  • Median: The middle value when data are ordered.

  • Mode: The value that appears most frequently.

Example: For the data set [36, 42, 16, 50, 6, 24, 61, 35, 8, 44, 19, 23, 67, 20]:

  • Mean: Add all values and divide by 14.

  • Median: Arrange values in order and find the middle value(s).

  • Mode: If no value repeats, there is no mode.

Midrange: The average of the highest and lowest values.

Interpretation of Results

  • The mean and median can give different interpretations of the data's spread.

  • If data are nominal (e.g., ID numbers), measures of central tendency may not be meaningful.

Significance and Z-Scores

Significantly Low and High Values

Values are considered significantly low or high if they fall outside certain boundaries based on the mean and standard deviation.

  • Significantly Low:

  • Significantly High:

  • Values Not Significant: Values between and

Example: For a mean of 20 and standard deviation of 2:

  • Significantly low:

  • Significantly high:

Z-Score Calculation

The z-score measures how many standard deviations a value is from the mean.

  • Formula:

  • Interpretation: Z-scores between -2 and 2 are typically not significant.

Example: If the mean body length is 46.8 inches and the standard deviation is 2 inches, the z-score for a body length of 46.6 inches is:

Comparing Values Using Z-Scores

  • Values with z-scores less than -2 are significantly low.

  • Values with z-scores greater than 2 are significantly high.

  • Values between -2 and 2 are not significant.

Summary Table: Key Statistical Concepts

Concept

Definition

Formula

Mean

Average value

Median

Middle value

--

Mode

Most frequent value

--

Midrange

Average of max and min

Correlation Coefficient

Strength of linear relationship

Regression Equation

Predicts value of y from x

Z-score

Standardized value

Additional info: These concepts form the foundation of descriptive and inferential statistics, essential for analyzing and interpreting data in various fields.

Pearson Logo

Study Prep