Skip to main content
Back

(Lecture 7) Association Between Two Quantitative Variables: Correlation and Regression

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Section 3.2: The Association Between Two Quantitative Variables

Introduction to Association

In statistics, understanding the relationship between two quantitative variables is essential for data analysis and prediction. This section explores how to describe, visualize, and quantify associations between variables such as Internet and Facebook usage rates across countries.

Example: Internet and Facebook Penetration Rates

Consider the following data for 31 countries, showing the percentage of the population using the Internet and Facebook:

Country

Internet Penetration

Facebook Penetration

Brazil

49.9%

29.5%

Canada

86.8%

51.9%

China

42.3%

0.1%

France

83.0%

39.0%

India

12.6%

5.6%

United States

81.0%

52.9%

Thailand

26.5%

26.5%

United Kingdom

87.0%

52.1%

Sweden

94.0%

52.0%

Philippines

36.2%

30.9%

Saudi Arabia

54.0%

20.7%

Spain

72.0%

38.1%

Turkey

45.1%

43.4%

Russia

53.3%

5.6%

Netherlands

93.0%

45.1%

Peru

38.2%

31.2%

Poland

65.0%

25.6%

South Africa

41.0%

12.3%

Japan

65.0%

13.5%

Malaysia

65.0%

41.5%

Mexico

38.4%

38.4%

Colombia

49.0%

30.3%

Egypt

44.1%

15.1%

Germany

72.8%

56.4%

Hong Kong

72.8%

56.4%

Indonesia

15.4%

20.7%

Italy

58.0%

38.1%

Venezuela

44.1%

32.6%

Additional info: Table entries inferred and grouped for clarity.

Measures of Center and Spread

To summarize the data, we use measures of center (mean, median) and spread (standard deviation, quartiles, minimum, maximum):

Variable

N

Mean

StDev

Minimum

Q1

Median

Q3

Maximum

Internet Use

32

59.2

22.4

12.6

43.6

56.9

81.3

94.0

Facebook Use

32

33.9

16.0

0.0

24.4

34.5

47.1

56.4

Graphical Displays: Histograms and Scatterplots

  • Histograms show the distribution of each variable, helping to identify outliers and the shape of the data.

  • Scatterplots display the relationship between two quantitative variables. The horizontal axis (x) represents the explanatory variable, and the vertical axis (y) represents the response variable.

Example: Scatterplot Interpretation

In the scatterplot of Internet vs. Facebook use, each point represents a country. Outliers, such as Japan (x = 79%, y = 13%), can be identified as points that deviate from the overall pattern.

How to Examine a Scatterplot

  • Describe the trend: Is the pattern linear, curved, clustered, or random?

  • Identify the direction: Is the association positive, negative, or none?

  • Assess the strength: How closely do the points follow the trend?

  • Look for outliers: Points that do not fit the overall pattern.

Interpreting Scatterplots: Direction and Association

  • Positive association: High values of x tend to occur with high values of y; low values of x with low values of y.

  • Negative association: High values of one variable tend to pair with low values of the other.

Section 3.2: Summarizing the Strength of Association: The Correlation Coefficient

Definition of Correlation

The correlation coefficient (r) measures the strength and direction of the linear association between two quantitative variables.

  • A positive r indicates a positive association.

  • A negative r indicates a negative association.

  • r close to +1 or -1 indicates a strong linear association.

  • r close to 0 indicates a weak association.

Formula for the correlation coefficient:

Properties of Correlation

  • Always falls between -1 and +1.

  • The sign of r denotes direction: negative for negative association, positive for positive association.

  • Unitless measure: does not depend on the units of the variables.

  • Correlation is not resistant to outliers.

  • Measures only the strength of a linear relationship.

  • Correlation is the same regardless of which variable is treated as the response or explanatory variable.

Examples and Applications

  • Scatterplots with points close to a straight line have stronger correlation.

  • Example: Internet and Facebook use for 32 countries yields .

Section 3.3: Predicting the Outcome of a Variable: Regression Analysis

Regression Line

A regression line is used to predict the value of the response variable (y) as a function of the explanatory variable (x). The equation of the regression line is:

  • a: y-intercept (predicted value of y when x = 0)

  • b: slope (change in y for a one-unit increase in x)

Example: Predicting Height from Femur Length

Regression equation: For a femur length of 50 cm: cm

Interpreting the y-Intercept and Slope

  • y-intercept: Predicted value for y when x = 0. May not always have practical meaning.

  • Slope: Amount that y changes for each one-unit increase in x. Positive slope indicates positive association; negative slope indicates negative association.

Residuals: Measuring Prediction Errors

Residuals measure the difference between observed and predicted values:

  • Large residuals indicate unusual observations.

  • Smaller absolute residuals mean better predictions.

Method of Least Squares

The least squares regression line minimizes the sum of squared residuals:

  • The line passes through the point .

  • The sum (and mean) of the residuals equals zero.

Formulas for Slope and Intercept

  • Slope:

  • Intercept:

Relationship Between Slope and Correlation

  • Correlation describes the strength of the linear association and is unitless.

  • Slope depends on the units of measurement and requires identification of response and explanatory variables.

Coefficient of Determination ()

The squared correlation () measures the proportion of the variation in the response variable explained by the linear relationship with the explanatory variable.

  • Example: For Internet and Facebook use, , so (37.7%).

  • This means 37.7% of the variability in Facebook use is explained by Internet use.

Summary Table: Correlation vs. Regression

Feature

Correlation

Regression

Measures

Strength & direction of linear association

Predicts response variable from explanatory variable

Unit

Unitless

Depends on variable units

Symmetry

Same regardless of variable roles

Requires response/explanatory distinction

Interpretation

Strong/weak, positive/negative

Change in y per unit change in x

Additional info: Table synthesized for comparison.

Pearson Logo

Study Prep