Z-Scores, Normal Models, and Sampling in Statistics

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Z-Scores and Standardization

Definition and Calculation of Z-Scores

Z-scores are used to standardize data, allowing comparison across different distributions by measuring how many standard deviations an observation is from the mean. The formula for calculating a z-score is:

Formula:
Steps:
1. Calculate the mean of the data set.
2. Calculate the standard deviation of the data set.
3. Subtract the mean from the identified number.
4. Divide this deviation by the standard deviation.
Interpretation: A larger absolute z-score indicates a more uncommon value in the data set. Z-scores can be positive or negative, depending on whether the value is above or below the mean.

Calculating Z-scores formula and steps

Properties of Z-Scores

Standardizing data into z-scores does not change the shape of the distribution, but it shifts and rescales the data:

Shape: Remains unchanged.
Center: The mean becomes 0.
Spread: The standard deviation becomes 1.

Z-score properties and effects on distribution

Density Curves and the Normal Model

Density Curves

Density curves are smooth curves drawn over histograms to represent the distribution of data. They must satisfy two main conditions:

Always positive or zero: The curve never dips below the x-axis.
Total area under the curve: Equals 1, representing 100% of the data.

Density curve with area under curve

The Normal Model

The normal model is a specific type of density curve that is bell-shaped, symmetric, and unimodal. It is defined by two parameters:

Mean (\mu): Determines the center of the curve.
Standard deviation (\sigma): Controls the spread of the curve.

Normal model notation and parameters

68-95-99.7 Rule

The normal model follows the empirical rule, which describes the proportion of data within certain standard deviations from the mean:

About 68% of values fall within ±1 standard deviation.
About 95% fall within ±2 standard deviations.
About 99.7% fall within ±3 standard deviations.

Normal curve with 68-95-99.7 rule

Sketching the Normal Curve

To sketch a normal curve accurately, remember:

The curve is symmetric and bell-shaped around the mean.
Draw only up to ±3 standard deviations; beyond this, the curve approaches zero.
The inflection point occurs one standard deviation from the mean.

How to sketch a normal curve

Working with Normal Models

Steps for Analysis

When working with normal models, follow these steps:

Plot a histogram to check for normality (unimodal, symmetrical).
Calculate z-scores for observations.
Look up normal percentiles using statistical tables.

Steps for working with normal models

Normal Probability Plots

Normal probability plots are used to check if data follows a normal distribution. If the data points form a straight line, the distribution is approximately normal. Deviations from the line indicate skewness:

Curving up and to the left: right-skewed (long tail to the right).
Curving down and to the right: left-skewed (long tail to the left).

Normal probability plot with skewness examples

Populations, Parameters, and Sampling

Populations and Parameters

Statistical models use parameters to represent key characteristics of populations. Parameters are estimated using sample statistics:

Population parameter: A characteristic of the entire population (e.g., mean, standard deviation).
Sample statistic: A characteristic calculated from a sample, used to estimate the population parameter.

Populations and parameters definitions

Notation for Statistics and Parameters

Statistics and parameters are denoted using specific symbols. Greek letters are used for parameters, Latin letters for statistics:

Name	Statistic	Parameter
Mean	\bar{y}	\mu
Standard deviation	s	\sigma
Correlation	r	\rho
Regression coefficient	b	\beta
Proportion	\hat{p}	p

Table of statistics and parameters notation

Sampling Methods and Survey Design

Cluster and Multistage Sampling

Cluster sampling involves dividing the population into clusters, then randomly selecting clusters to sample. Multistage sampling combines several methods, often used in large surveys:

Cluster sampling: Each cluster represents the full population; not all clusters are sampled.
Multistage sampling: Combines stratified, cluster, and simple random sampling.

Cluster and multistage sampling explanation

From Population to Sample: Sampling Frame and Target Sample

The process of moving from the population to the sample involves several steps, each introducing potential biases:

The sampling frame is the list from which the sample is drawn.
The target sample is the group intended to be studied.
The actual sample consists of respondents.
Ambiguity in these groups can affect the success and validity of a study.

Population to sample process

Bias and Constraints in Sampling

Each step in sampling can constrain the group studied and introduce biases. Simple random sampling helps maintain representativeness:

Constraints may limit which groups can be studied.
Biases can arise if the sample does not match the population of interest.
Simple random sampling preserves the sense of 'who's Who' in the population.

Bias and constraints in sampling

Designing a Valid Survey

To ensure a valid survey, follow these guidelines:

Know: Define what you want to know.
Sampling Frame: Use the correct sampling frame.
Tune: Tune your instrument for data collection.
Specific: Ask specific questions.
Quantitative: Prefer quantitative results.
Phrasing: Phrase questions carefully.
Pilot: Perform a pilot test.

Survey design guidelines

Common Sampling Mistakes and Biases

Types of Bias

Several biases can affect the validity of survey results:

Undercoverage: Certain groups are not included in the sample.
Nonresponse bias: Those who do not respond may differ from those who do.
Response bias: Survey design influences responses.
Volunteer bias: Only volunteers are surveyed, which may not represent the population.
Convenience sampling: Only convenient individuals are sampled, leading to unrepresentative data.

Key Principle: Everyone should have an equal chance of participating in a survey to minimize bias.

Summary Table: Sampling Biases

Bias Type	Description
Undercoverage	Excludes certain groups from the sample
Nonresponse	Differences between respondents and nonrespondents
Response	Survey design influences answers
Volunteer	Only volunteers are surveyed
Convenience	Sample is chosen for convenience

Additional info: These notes cover core concepts from Chapters 5 (The Standard Deviation as a Ruler and the Normal Model) and 10 (Sample Surveys) of a college statistics course, including z-scores, normal models, survey design, and sampling biases.