Hypothesis Testing with Two Samples: Independent and Dependent Samples, z-Tests, and t-Tests

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Chapter 8: Hypothesis Testing with Two Samples

Understanding the Sampling Distribution of the Difference Between Two Means

When comparing two population means, it is essential to understand the sampling distribution of their difference. This distribution allows us to make inferences about whether observed differences are statistically significant or likely due to random variation.

Sampling Distribution: The probability distribution of the difference between two sample means, assuming random sampling from each population.
Standard Error: The standard deviation of the sampling distribution of the difference between means, which quantifies the expected variability in mean differences due to sampling error.

Independent vs. Dependent Samples

It is crucial to distinguish between independent and dependent samples, as the statistical tests and formulas differ for each scenario.

Independent Samples

Samples are independent if knowing information about one sample provides no information about the other.
Examples: Different groups assigned to treatment or control, samples from different regions, or different grade levels.

Dependent Samples

Samples are dependent if there is a relationship between them, such as repeated measurements on the same subjects or matched pairs.
Examples: Pre- and post-treatment measurements on the same individuals, siblings, or matched pairs in experiments.

Applying the correct formula is essential to avoid decision errors in hypothesis testing.

Children running in a race, illustrating independent groups

Classifying Sample Pairs

Example 1: Triglyceride levels before and after treatment in the same patients → Dependent
Example 2: Scores for males vs. females on a test → Independent

Two-Sample z-Test for the Difference Between Means

The two-sample z-test is used when comparing the means of two independent samples, provided the population standard deviations are known and other assumptions are met.

Population standard deviations (σ) are known.
Samples are randomly selected and independent.
Populations are normally distributed or sample sizes are large (n ≥ 30).

Formula for the z-test:

Where:

= sample means
= population means (often assumed equal under H0)
= population standard deviations
= sample sizes

Example: Comparing Credit Card Debts

A watchdog group claims a difference in mean credit card debts between Oklahoma and North Carolina. Data from random samples (n = 250 each) are:

Oklahoma	North Carolina
n_1 = 250	n_2 = 250

Table of sample means and sizes for Oklahoma and North Carolina

Assume and . Using , the z-test is performed. The critical values are for a two-tailed test.

Normal distribution with rejection regions and test statistic

Since the calculated z is not in the rejection region, we fail to reject the null hypothesis. There is not enough evidence at the 5% significance level to support the claim of a difference in mean debts.

Switching to t: The t-Test for the Difference Between Means

When population standard deviations are unknown, the t-test is used. This test was developed by William Sealy Gosset ("Student") while working at Guinness Brewery, leading to the famous Student's t-distribution.

Pints of Guinness beer, referencing Gosset's work at Guinness How beer changed statistics forever, referencing Gosset and the t-distribution

Used when and are unknown.
Samples are random and independent.
Populations are normally distributed or both sample sizes are at least 30.

Formula for the t-test (equal variances):

Where is the pooled variance estimate:

Degrees of freedom:

If variances are not equal, use:

Degrees of freedom: smaller of or

Example: Comparing Teaching Methods

Method 1: Traditional
Method 2: New (with technology)
Right-tailed test (claim: new method is superior)
Calculate pooled variance, standard error, and t-statistic

The logic is to compare the observed mean difference to what would be expected under the null hypothesis, using the t-distribution to determine significance.

t vs. z

z-test: Used when population parameters are known or sample size is very large.
t-test: Used with sample data and unknown population parameters; t-distribution is flatter with thicker tails for small samples.

t-Table

The t-table provides critical values for various degrees of freedom and significance levels.

t-distribution table with critical values

Dependent Samples t-Test (Paired Samples t-Test)

When samples are related (e.g., pre-post designs, matched pairs), the paired samples t-test is used. This test analyzes the differences within pairs rather than treating the samples as independent.

Calculate the difference for each pair:
Compute the mean and standard deviation of the differences
Test statistic:

Where:

= mean of the differences
= standard deviation of the differences
= number of pairs
Degrees of freedom:

t-distribution for paired differences

Applications of Dependent Samples t-Test

Pre- and post-test designs
Matched pairs (e.g., siblings, matched controls)
Longitudinal studies (same individuals measured over time)

Summary Table: Independent vs. Dependent Samples

Independent Samples	Dependent Samples
Different individuals in each group e.g., treatment vs. control	Same or matched individuals e.g., pre-post, matched pairs
Use two-sample z or t-test	Use paired samples t-test
Assume no relationship between samples	Assume relationship between samples

Additional info: The choice of test depends on the study design and whether the samples are independent or related. Always check assumptions before applying these tests.