Statistics Chapter 1: Data Collection and the Foundations of Statistical Thinking

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

1.1 Introduction to the Practice of Statistics

1.1.1 Define Statistics and Statistical Thinking

Statistics is the science of collecting, organizing, summarizing, and analyzing information to draw conclusions or answer questions. It also involves providing a measure of confidence in any conclusions. The information used in statistics is called data, which describes characteristics of individuals and exhibits variability—differences among individuals or within the same individual over time.

Data: Observations that describe characteristics of individuals.
Variability: The tendency for data to differ among individuals or over time.
Goal of Statistics: To describe and understand sources of variability.

1.1.2 Explain the Process of Statistics

The process of statistics involves several key steps, each essential for drawing reliable conclusions from data:

Population: The entire group of individuals to be studied.
Individual: A single member of the population.
Sample: A subset of the population selected for study.

Diagram showing the relationship between population, sample, and individual

Parameter: A numerical summary of a population.
Statistic: A numerical summary based on a sample.
Descriptive Statistics: Methods for organizing and summarizing data (e.g., tables, graphs, numerical summaries).
Inferential Statistics: Methods that use sample results to make generalizations about a population and measure the reliability of these results.

Steps in the Statistical Process:

Identify the research objective and the population to be studied.
Collect the data needed to answer the research question (often from a sample).
Describe the data using descriptive statistics.
Perform inference to extend results from the sample to the population, reporting the reliability of the results.

1.1.3 Distinguish between Qualitative and Quantitative Variables

Variables are characteristics of individuals within a population. They can be classified as:

Qualitative (Categorical) Variables: Classify individuals based on attributes or characteristics (e.g., gender, zip code).
Quantitative Variables: Provide numerical measures of individuals, allowing meaningful arithmetic operations (e.g., temperature, number of study days).

Example:

Gender: Qualitative
Temperature: Quantitative
Number of study days: Quantitative
Zip code: Qualitative

1.1.4 Distinguish between Discrete and Continuous Variables

Quantitative variables can be further classified as:

Discrete Variables: Have a finite or countable number of possible values (e.g., number of heads in coin flips).
Continuous Variables: Have an infinite number of possible values, measurable to any desired accuracy (e.g., distance traveled by a car).

Data Types:

Qualitative Data: Observations from qualitative variables.
Quantitative Data: Observations from quantitative variables.
Discrete Data: Observations from discrete variables.
Continuous Data: Observations from continuous variables.

1.1.5 Determine the Level of Measurement of a Variable

Variables can be measured at different levels, which determine the types of statistical analyses that are appropriate:

Nominal Level: Values name, label, or categorize; no inherent order (e.g., gender).
Ordinal Level: Values can be ranked or ordered (e.g., letter grades).
Interval Level: Differences between values have meaning, but zero does not indicate absence (e.g., temperature in Celsius).
Ratio Level: Ratios of values have meaning, and zero indicates absence (e.g., number of study days).

Example:

Gender: Nominal
Temperature: Interval
Number of study days: Ratio
Letter grade: Ordinal

1.2 Observational Studies Versus Designed Experiments

1.2.1 Distinguish between an Observational Study and an Experiment

In statistics, it is important to distinguish between how data are collected:

Observational Study: Measures the value of the response variable without influencing the study's individuals or variables. The researcher simply observes.
Designed Experiment: The researcher assigns individuals to groups, manipulates the explanatory variable, and records the response variable.

Key Terms:

Response Variable: The outcome of interest.
Explanatory Variable: The variable that is manipulated or categorized to observe its effect on the response variable.
Confounding: When the effects of two or more explanatory variables are not separated, making it unclear which variable is causing an effect.
Lurking Variable: An unmeasured variable that affects the response variable and is related to the explanatory variable.
Association vs. Causation: Observational studies can show association but not causation; experiments are needed to establish causation.

1.2.2 Explain the Various Types of Observational Studies

Cross-sectional Studies: Collect information at a specific point in time or over a short period.
Case-control Studies: Retrospective; compare individuals with a characteristic (cases) to those without (controls).
Cohort Studies: Prospective; follow a group (cohort) over time, recording characteristics and outcomes.

1.2.2 Census Data

A census is a list of all individuals in a population, along with certain characteristics. In the U.S., a census is conducted every 10 years and is used for political representation and allocation of government resources.

1.2.2 Obtaining Data through Web Scraping

Web scraping (or data mining) is the process of extracting data from the Internet. It is widely used in data science but raises ethical concerns, especially when data are collected without permission.

1.3 Simple Random Sampling

1.3.1 Obtain a Simple Random Sample

Random sampling uses chance to select individuals from a population, ensuring that every possible sample has an equal chance of being chosen. This is essential for unbiased results.

Simple Random Sample: Every possible sample of size n from a population of size N has an equally likely chance of being selected.
Convenience Sampling: Using easily available individuals, which leads to meaningless results.

Diagram showing the relationship between population, sample, and individual

Steps for Obtaining a Simple Random Sample:

Obtain a frame (list) of all individuals in the population and number them from 1 to N.
Use a random number generator or table to select n numbers corresponding to the sample size.
Match the selected numbers to the individuals in the frame to form the sample.

Example: If you have 30 clients and want to select 5 at random, number them 01 to 30, generate 5 random numbers, and select the corresponding clients.