Skip to main content
Back

Association: Chi-Square Tests, Correlation, and Linear Regression

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Association: Chi-Square Tests, Correlation, and Linear Regression

Learning Outcomes

By the end of this section, students should be able to select appropriate statistical tools to analyze the association between variables. This includes understanding when to use chi-square tests, correlation, and linear regression, and how to interpret their results.

Chi-Square Test

Introduction to Chi-Square Test

  • Chi-square tests are used for hypothesis testing in qualitative (categorical) and non-parametric data.

  • They are based on counts representing the number of items in each category.

  • The test evaluates whether the difference between actual counts and expected counts (under the null hypothesis) is due to chance.

Types of Chi-Square Tests:

  • Goodness of Fit: Used when there is one nominal variable to test if observed distribution fits an expected distribution.

  • Test for Independence: Used when there are two or more nominal variables to test if they are independent.

Chi-Square Test Formula

The formula for the chi-square statistic is:

  • = Observed frequency

  • = Expected frequency

  • = Summation over all categories

Chi-Square Test: Goodness of Fit

The goodness of fit test determines if sample data matches an expected distribution. For example, a student repeats Mendel's genetics experiment by crossing two purple Hyacinth plants (each with a dominant purple allele and a recessive white allele). The results are 30 purple and 14 white Hyacinth plants.

  • Task:

    • Set hypothesis

    • Collect data

    • Perform test

    • Draw conclusion

Hypotheses:

  • Null hypothesis (H0): The results support Mendel's prediction.

  • Alternative hypothesis (HA): The results do not support Mendel's prediction.

Degrees of Freedom:

Example Table: Chi-Square Goodness of Fit Calculation

Phenotypes

Observed

Expected

O-E

(O-E)2

(O-E)2/E

Purple

30

33

-3

9

0.27272

White

14

11

3

9

0.81818

Total

44

44

1.0909

Interpreting the Result

  • Chi-Square Score = 1.0909

  • Critical value at and is 3.84

  • Since 1.0909 < 3.84, we fail to reject the null hypothesis.

  • The result supports Mendel's prediction.

Chi-Square Table (Critical Values)

Degrees of Freedom (df)

0.05

0.01

0.001

1

3.84

6.64

10.83

2

5.99

9.21

13.82

3

7.81

11.34

16.27

Chi-Square Test: Test for Independence

The test for independence determines if two categorical variables are independent. If the observed counts differ significantly from expected counts (assuming independence), the variables are likely associated.

  • Null hypothesis (H0): The variables are independent.

  • Alternative hypothesis (HA): The variables are dependent (associated).

Example: Is the preference for the shape of roti canai related to gender?

Correlation and Linear Regression

Correlation

Correlation investigates the relationship between two interval (continuous) variables, such as x and y. The scatter plot is commonly used to visualize this relationship. In correlation analysis, there is no distinction between dependent or independent variables.

  • Pearson's Correlation Coefficient (r): Measures the strength and direction of a linear relationship between two variables.

  • Range:

  • : Perfect positive linear correlation

  • : Perfect negative (inverse) linear correlation

  • : No linear correlation

Visual Representation of Correlation

  • No correlation: Points are scattered randomly.

  • Imperfect positive correlation: Points trend upward but with scatter ().

  • Perfect positive correlation: All points lie on an upward-sloping line ().

  • Imperfect negative correlation: Points trend downward but with scatter ().

  • Perfect negative correlation: All points lie on a downward-sloping line ().

Example: COVID-19 Data

  • Research question: On a global scale, is the number of daily new COVID-19 cases correlated with daily new deaths?

  • Pearson's product-moment correlation was calculated:

data: covidnew_deaths t = 712.55, df = 82704, p-value < 2.2e-16 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 0.9263614 0.9282707 sample estimate: cor = 0.9273221

  • Interpretation: There is a strong positive correlation between daily new cases and daily new deaths.

Linear Regression

Linear regression uses a linear function to predict the value of a dependent variable based on an independent variable. The independent variable is usually plotted on the x-axis, and the dependent variable on the y-axis.

  • Simple linear regression equation:

  • = dependent variable

  • = independent variable

  • = intercept

  • = slope (regression coefficient)

Example: COVID-19 Data

  • Regression equation for daily new deaths () and daily new cases ():

  • Interpretation: For each additional new case, the model predicts an increase of approximately 0.01859 new deaths, with a baseline (intercept) of 21.22561 deaths when cases are zero.

Assessing the Regression Model

  • Key statistics to assess include:

    • R-squared: Proportion of variance in the dependent variable explained by the independent variable.

    • p-value: Tests the null hypothesis that the slope is zero (no relationship).

    • Residuals: Differences between observed and predicted values; should be randomly distributed.

Call: lm(formula = covidnew_cases) Coefficients: (Intercept) covid$new_cases 21.22561 0.01859 Residual standard error: 381.3 on 82704 degrees of freedom Multiple R-squared: 0.8599, Adjusted R-squared: 0.8599 F-statistic: 5.077e+05 on 1 and 82704 DF, p-value: < 2.2e-16

  • Interpretation: The model explains about 86% of the variance in daily new deaths, and the relationship is statistically significant (p-value < 0.05).

Summary Table: When to Use Each Test

Test

Type of Variables

Main Purpose

Chi-Square Goodness of Fit

One categorical variable

Test if observed distribution matches expected distribution

Chi-Square Test for Independence

Two categorical variables

Test if variables are independent

Correlation (Pearson's r)

Two continuous variables

Measure strength and direction of linear relationship

Linear Regression

One continuous dependent, one continuous independent variable

Predict dependent variable from independent variable

Additional info: This guide covers core concepts from Chi-Square tests (goodness of fit and independence), correlation, and simple linear regression, with examples and interpretation relevant for college-level statistics.

Pearson Logo

Study Prep