Standardization, Normal Models, and Sampling in Statistics

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Z-Scores and Standardization

Definition and Calculation of Z-Scores

Z-scores are used to standardize data, allowing comparison across different distributions by expressing values in terms of their distance from the mean, measured in standard deviations. The z-score formula is:

Formula:
Steps:
1. Calculate the mean of the data set.
2. Calculate the standard deviation of the data set.
3. Subtract the mean from the identified number.
4. Divide this deviation by the standard deviation.
Interpretation: Larger absolute z-scores indicate more uncommon values; z-scores can be positive or negative depending on whether the value is above or below the mean.

Calculating Z-scores formula and steps

Properties of Z-Scores

Standardizing data into z-scores does not change the shape of the distribution, but it shifts and rescales the data:

Shape: Remains unchanged.
Center: Mean becomes 0.
Spread: Standard deviation becomes 1.

Z-score properties and effects on distribution

Density Curves and the Normal Model

Density Curves

Density curves are smooth curves drawn over histograms to model the distribution of data. They must satisfy:

Always positive or zero: The curve never dips below the x-axis.
Total area under the curve: Must equal 1, representing 100% of the data.

Density curve with area under curve

The Normal Model

The normal model is a specific type of density curve that is bell-shaped, symmetric, and unimodal. It is defined by two parameters:

Mean (\mu): Determines the center of the curve.
Standard deviation (\sigma): Controls the spread; smaller \sigma results in a narrower curve.

Normal model notation and parameters

68-95-99.7 Rule

The normal model follows the empirical rule:

About 68% of values fall within ±1 standard deviation of the mean.
About 95% fall within ±2 standard deviations.
About 99.7% fall within ±3 standard deviations.

Normal curve with 68-95-99.7 rule

Sketching the Normal Curve

To sketch a normal curve:

Start at the mean, which is the center.
Draw the curve symmetrically for three standard deviations on either side.
The inflection point is one standard deviation from the mean.

How to sketch a normal curve

Working with Normal Models

Steps for Analysis

When working with normal models:

Plot a histogram to check for normality (unimodal, symmetrical).
Calculate z-scores for observations.
Look up normal percentiles using statistical tables.

Working with normal models steps

Normal Probability Plots

Normal probability plots are used to check the assumption of normality:

Plot z-scores on the horizontal axis.
If the data forms a straight line, it is approximately normal.
Curved plots indicate skewness: upward to the left (right-skewed), downward to the right (left-skewed).

Normal probability plot with skewness examples

Sampling and Surveys

Sample Surveys and Sampling Principles

Sample surveys are used to gather information about a population by examining a subset. The three key principles are:

Examine a part of the whole: A properly selected sample can represent the entire population.
Randomize: Random selection protects against bias and ensures representativeness.
Sample size: The size of the sample is more important than the fraction of the population sampled.

Census and Population Parameters

A census attempts to survey every individual in a population, but is often impractical. Parameters are characteristics of the population, while statistics are characteristics of the sample.

Populations and parameters explanation

Statistical Notation

Statistics and parameters are denoted using Latin and Greek letters, respectively. The table below summarizes common notation:

Name	Statistic	Parameter
Mean	\bar{y}	\mu
Standard deviation	s	\sigma
Correlation	r	\rho
Regression coefficient	b	\beta
Proportion	\hat{p}	p

Statistics and parameter notation table

Sampling Strategies

The method of sampling affects the representativeness of the data:

Simple random sample: Each individual has an equal chance of being selected.
Stratified sampling: Population is divided into homogeneous groups (strata), and random samples are taken from each.
Cluster/multistage sampling: Population is split into clusters, and some clusters are sampled. Multistage sampling combines several methods.
Systematic sampling: Individuals are selected at regular intervals from a list.

Cluster and multistage sampling explanation

Survey Validity and Common Sampling Mistakes

From Population to Sample

Defining the 'who' of a survey is crucial. The sampling frame, target sample, and actual respondents must be specified to avoid ambiguity and bias.

Population to sample explanation Population to sample constraints and bias

Ensuring Valid Surveys

To ensure a valid survey:

Know what you want to know.
Use the right sampling frame.
Tune your instrument.
Ask specific and quantitative questions.
Phrase questions carefully.
Perform a pilot test.

Checklist for valid survey design

Common Sampling Mistakes

Common mistakes include:

Sampling only volunteers or conveniently available people.
Using a bad sampling frame (e.g., missing certain groups).
Undercoverage (excluding segments of the population).
Nonresponse bias (differences between respondents and nonrespondents).
Response bias (survey design influences responses).

Additional info: Always consider who could have been missed in your study and strive for equal participation opportunity.