Foundations of Statistics: Data Collection, Summarizing, and Descriptive Measures

Notes Practice Video lessons

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Data Collection and Study Design

Types of Variables

Understanding the types of variables is essential for designing statistical studies and interpreting results.

Confounding Variable: Effects of two or more explanatory variables are not separated, making it difficult to determine which variable is responsible for the observed effect.
Lurking Variable: A variable not considered in a study that affects the value of the response variable.
Explanatory Variable: The variable that is manipulated or categorized to explain changes in the response variable.
Response Variable: The outcome or result being measured in a study.

Types of Studies

Statistical studies can be classified based on how data is collected and the time frame considered.

Cross-Sectional Study: Observational studies that collect information at a specific point in time.
Case-Control Study: Observational studies that are retrospective, looking back in time.
Cohort Study: Observational studies that follow subjects forward in time.
Superiority: Neither study type is always superior; it depends on the situation.

Sampling Methods

Sampling Terminology

Sampling is the process of selecting individuals from a population to estimate characteristics of the whole group.

Frame: The list of individuals in the population being studied.
Simple Random Sampling: Every sample of size n from a population of size N has an equally likely chance of occurring.
Sampling Without Replacement: Once selected, the individual cannot be selected again.
Sampling Methods Not Requiring a Frame: Cluster and systematic sampling.

Cluster and Stratified Sampling

Cluster and stratified sampling are methods used to improve representativeness and efficiency.

Cluster Sampling: Dividing the population into groups (clusters) and selecting all individuals from some clusters.
Stratified Sampling: Dividing the population into homogeneous groups (strata) and randomly selecting from each group proportionally to the size of the strata.

Descriptive Statistics

Descriptive vs. Inferential Statistics

Statistics is divided into descriptive and inferential branches.

Descriptive Statistics: Organizing and summarizing information collected.
Inferential Statistics: Methods that generalize results from a sample to the population and measure reliability.

Key Terms

Statistic: A numerical summary of a sample.
Parameter: A numerical summary of a population.
Individual: A person or object that is a member of the population being studied.

Observational vs. Experimental Studies

Observational studies measure variables without influencing them, while experimental studies assign treatments to measure their effects.

Observational Study: Measures the value of the response variable without influencing it.
Experimental Study: Assigns treatments to individuals to measure the effect on the response variable.

Frequency Distributions and Tables

Frequency Distribution

Frequency distributions summarize the occurrence of each category of data.

Frequency Distribution: Lists the number of occurrences of each category of data.
Relative Frequency Distribution: Lists the proportion of occurrences of each category of data.
Relative frequencies must add up to 1.

Class and Class Width

Class: Categories by which data are grouped.
Class Width: The distance between consecutive lower class limits.

Example Table: Relative Frequency Distribution

The following table shows an example of a relative frequency distribution:

Category	Frequency	Relative Frequency
Under 18	10	0.20
18-24	15	0.30
25-34	25	0.50

Additional info: Table values are inferred for illustration.

Numerical Summaries of Data

Measures of Central Tendency

Central tendency measures describe the center of a data set.

Mean (\bar{x}): The average value.
Median: The middle value when data are ordered.
Mode: The value that occurs most frequently.

Measures of Dispersion

Dispersion measures describe the spread of data.

Standard Deviation (\sigma, s): Measures the average distance of data points from the mean.
Variance: The square of the standard deviation.
Interquartile Range (IQR): The difference between the upper and lower quartiles.

Formulas:

Sample Mean:
Population Standard Deviation:
Sample Standard Deviation:
Interquartile Range:

Resistant Statistics

A statistic is resistant if extreme values (outliers) do not affect its value substantially. The median is resistant, while the mean is not.

Standard Deviation and Its Uses

Standard Deviation in Distribution Analysis

The standard deviation is used with the mean to numerically describe distributions. The mean measures the center, while the standard deviation measures the spread.

Whiskers: Less precise than statistical rules; only works for bell-shaped distributions.
Empirical Rule: Only works for bell-shaped distributions.

Z-Scores and Quartiles

Z-Score Calculation

A z-score measures how many standard deviations an element is from the mean.

Formula:

Quartiles and Interquartile Range

Lower Quartile (Q1): The 25th percentile.
Upper Quartile (Q3): The 75th percentile.
Interquartile Range (IQR):

Relationships Between Variables

Types of Relationships

Statistical relationships can be positive, negative, or strong/weak.

Positively Associated: One variable increases as the other increases.
Negatively Associated: One variable increases as the other decreases.
Strong Association: Points are close to a line or curve.
Weak Association: Points are spread out.

Contingency Tables

Contingency tables are used to examine relationships between categorical variables.

Marginal Distribution: The frequency of one variable regardless of the other.
Conditional Distribution: The frequency of one variable given the value of another variable.

Experimental Design Terminology

Key Terms

Underrepresented: Proportionally smaller in a sample than in its population.
Experimental Unit: Person/object receiving treatment.
Placebo: Fake treatment that mimics the real treatment.
Treatment: Combination of factor levels administered to experimental units.
Response Variable: Outcome of being measured.
Blocking: Subjects are grouped into blocks based on a variable that is expected to affect the response.
Matched Pair: Pre- and post-tests or paired subjects.