BackScatter Diagrams, Correlation, and Linear Regression in Statistics
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Scatter Diagrams and Correlation
Introduction to Bivariate Data
Bivariate data involves measurements of two variables for each individual in a study. Analyzing bivariate data allows us to explore relationships between variables, often using graphical and numerical methods.
Response (Dependent) Variable: The variable whose value is explained by the explanatory variable.
Explanatory (Independent) Variable: The variable that explains or influences changes in the response variable.
Scatter Diagrams
A scatter diagram (or scatter plot) is a graph that displays the relationship between two quantitative variables. Each point represents an individual, with the explanatory variable on the horizontal axis and the response variable on the vertical axis.
Purpose: To visually assess the type and strength of the relationship between two variables.
Types of Relationships: Linear, nonlinear, or no relation.
Example: Predicting the selling price of a home using data from Zillow, where the Zestimate is the explanatory variable and Sale Price is the response variable.
Sample Table: Drilling Data
Depth at Which Drilling Begins (in feet), x | Time to Drill Five Feet (in minutes), y |
|---|---|
54 | 5.98 |
75 | 6.41 |
93 | 5.90 |
110 | 6.74 |
130 | 6.27 |
145 | 7.47 |
155 | 6.82 |
165 | 7.42 |
178 | 7.89 |
190 | 7.90 |
Additional info: This table is used to illustrate how scatter diagrams can reveal relationships between depth and drilling time.
Interpreting Scatter Diagrams
Positive Association: Higher values of one variable are associated with higher values of the other.
Negative Association: Higher values of one variable are associated with lower values of the other.
No Association: No apparent relationship between the variables.
Linear Correlation Coefficient
Definition and Properties
The linear correlation coefficient (Pearson product moment correlation coefficient), denoted by for sample and for population, measures the strength and direction of the linear relationship between two quantitative variables.
Range:
Interpretation:
: Perfect positive linear relationship
: Perfect negative linear relationship
: No linear relationship
The closer is to 1, the stronger the linear association
Not Resistant: Outliers can greatly affect .
Only Measures Linear Association: does not detect nonlinear relationships.
Formula for Sample Linear Correlation Coefficient
The formula for the sample linear correlation coefficient is:
: th observation of the explanatory variable
: th observation of the response variable
: Mean of the explanatory variable
: Mean of the response variable
: Standard deviation of the explanatory variable
: Standard deviation of the response variable
: Number of individuals in the sample
Example: Computing by Hand
Depth, x | Time, y | Product | ||||
|---|---|---|---|---|---|---|
54 | 5.98 | -72.5 | -1.74717 | -1.41641 | -2.54051 | 3.59801 |
75 | 6.41 | -51.5 | -1.34992 | -1.00644 | -1.96444 | 1.97760 |
93 | 5.90 | -33.5 | -1.85983 | -0.65536 | -2.70852 | 1.77589 |
110 | 6.74 | -16.5 | -0.96853 | -0.32356 | -1.41056 | 0.45698 |
130 | 6.27 | 3.5 | -1.43853 | 0.06819 | -2.09639 | -0.14298 |
145 | 7.47 | 18.5 | 0.76347 | 0.36057 | 1.11284 | 0.40167 |
155 | 6.82 | 28.5 | 0.11347 | 0.55609 | 0.16548 | 0.09192 |
165 | 7.42 | 38.5 | 0.71347 | 0.75161 | 1.04196 | 0.78244 |
178 | 7.89 | 51.5 | 1.18347 | 1.00644 | 1.72714 | 1.73907 |
190 | 7.90 | 63.5 | 1.19347 | 1.24092 | 1.72926 | 2.14716 |
Additional info: The table above shows the step-by-step calculation for using drilling data.
Final calculation:
Using Technology to Compute
Statistical software such as StatCrunch or online applets can be used to quickly compute the linear correlation coefficient for large datasets.
Testing for a Linear Relation
Steps to Test for Linearity
Determine the absolute value of the correlation coefficient.
Find the critical value for the sample size.
If the absolute value of the correlation coefficient is greater than the critical value, a linear relation exists.
Example: Testing whether a linear relation exists between drilling depth and time to drill five feet.
Correlation vs. Causation
Key Differences
Correlation measures the strength and direction of a linear relationship between two variables, but does not imply that changes in one variable cause changes in the other.
Causation: Implies that one variable directly affects another.
Lurking Variable: A variable not included in the analysis that may influence both variables being studied.
Example: Ice cream sales and drowning rates may be correlated due to a lurking variable (temperature), not because one causes the other.
Least-Squares Regression Line
Objectives
Find the least-squares regression line and use it for predictions.
Interpret the slope and y-intercept.
Compute the sum of squared residuals.
Finding the Regression Line
Given data points , the least-squares regression line is the line that minimizes the sum of squared residuals (differences between observed and predicted values).
Equation of the Regression Line:
: Slope of the line
: y-intercept
Example: Using sample data to find the regression line and make predictions.
x | y |
|---|---|
0 | 5.3 |
2 | 5.7 |
3 | 5.2 |
5 | 2.8 |
6 | 1.9 |
Choose points (2, 5.7) and (6, 1.9) to find the equation of the line.
Prediction: Use the regression equation to predict for a given value.
Summary Table: Properties of the Linear Correlation Coefficient
Property | Description |
|---|---|
Range | |
Strength | Closer is to 1, stronger the linear association |
Direction | Positive indicates positive association; negative indicates negative association |
Interpretation | means no linear association |
Resistant? | Not resistant to outliers |
Type of Relation | Measures only linear relationships |
Cautions and Limitations
A correlation coefficient close to 0 does not imply no relationship, only no linear relationship.
Always examine scatter diagrams to detect nonlinear associations.
Do not infer causation from correlation without further investigation.
Practice and Application
Use scatter diagrams and correlation coefficients to analyze real-world data (e.g., SAT scores vs. teacher salaries).
Consider lurking variables when interpreting results.
Apply regression analysis to make predictions and interpret relationships.