Skip to main content
Back

Chapter 1: Data Collection and Introduction to Statistics

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Statistics: Informed Decisions Using Data

1.1 Introduction to the Practice of Statistics

Statistics is the science of collecting, organizing, summarizing, and analyzing information to draw conclusions or answer questions. It also provides a measure of confidence in any conclusions. The information used in statistics is called data, which describes characteristics of individuals and exhibits variability.

  • Statistics: The science of data collection, analysis, and interpretation.

  • Statistical Thinking: Understanding that data vary and seeking to describe and understand sources of variability.

  • Data: Facts or propositions used to draw conclusions or make decisions.

  • Variability: The tendency of data to differ among individuals or over time.

Process of Statistics

The process of statistics involves four main steps:

  1. Identify the research objective: Define the question and population to be studied.

  2. Collect the data: Gather data from a sample or population using appropriate methods.

  3. Describe the data: Use descriptive statistics to summarize and visualize data.

  4. Perform inference: Apply inferential statistics to generalize results from the sample to the population and report reliability.

Population, Sample, Individual diagram

Descriptive vs. Inferential Statistics

  • Descriptive Statistics: Organizing and summarizing data using numerical summaries, tables, and graphs.

  • Inferential Statistics: Methods that take results from a sample, extend them to the population, and measure reliability.

  • Parameter: Numerical summary of a population.

  • Statistic: Numerical summary of a sample.

Example: Parameter vs. Statistic

  • If 48.2% of all students own a car, this is a parameter.

  • If 46% of a sample of 100 students own a car, this is a statistic.

Variables and Types

Variables are characteristics of individuals within a population. They vary and are classified as qualitative or quantitative.

  • Qualitative (Categorical) Variables: Classify individuals based on attributes or characteristics.

  • Quantitative Variables: Provide numerical measures; values can be added or subtracted meaningfully.

Classification of variables diagram

Discrete vs. Continuous Variables

  • Discrete Variable: Quantitative variable with a finite or countable number of values (e.g., number of cars).

  • Continuous Variable: Quantitative variable with infinite, uncountable values (e.g., distance, time).

Data Types

  • Qualitative Data: Observations from qualitative variables.

  • Quantitative Data: Observations from quantitative variables.

  • Discrete Data: Observations from discrete variables.

  • Continuous Data: Observations from continuous variables.

Example: Parking Meter Data

Table 1 presents data from a sample of on-street parking meters in Seattle. Individuals are cars; variables include payment method, amount paid, duration, side of street, and parking space number.

Car

Payment Method

Amount Paid

Duration (min)

Side of Street

Parking Space Number

1

Credit Card

$3.75

30

W

458

2

Credit Card

$2.00

240

E

37

3

Credit Card

$2.00

240

NE

18

4

Phone

$1.38

225

SW

382

5

Phone

$0.50

60

S

770

6

Credit Card

$0.25

10

S

136

7

Credit Card

$0.50

120

S

59

8

Credit Card

$0.50

20

S

69

9

Credit Card

$0.50

20

S

69

10

Phone

$0.75

30

S

15

11

Phone

$1.71

204

SW

382

Parking meter data table

Levels of Measurement

  • Nominal: Values name, label, or categorize; no order (e.g., race).

  • Ordinal: Values can be ranked or ordered (e.g., letter grades).

  • Interval: Differences between values have meaning; zero does not mean absence (e.g., temperature).

  • Ratio: Ratios of values have meaning; zero means absence (e.g., number of days studied).

1.2 Observational Studies Versus Designed Experiments

Observational Study vs. Experiment

  • Observational Study: Measures the value of the response variable without influencing explanatory or response variables.

  • Designed Experiment: Researcher assigns individuals to groups, manipulates explanatory variables, and records response variable.

Confounding and Lurking Variables

  • Confounding Variable: Explanatory variable considered in a study whose effect cannot be distinguished from another variable.

  • Lurking Variable: Explanatory variable not considered in a study but affects the response variable.

Types of Observational Studies

  • Cross-sectional Studies: Collect information at a specific point in time.

  • Case-control Studies: Retrospective; compare individuals with certain characteristics to those without.

  • Cohort Studies: Prospective; follow a group (cohort) over time and record characteristics.

1.3 Simple Random Sampling

Simple Random Sample

A simple random sample is obtained when every possible sample of size n from a population of size N has an equally likely chance of occurring.

  • Random Sampling: Using chance to select individuals for a sample.

  • Sample Without Replacement: Selected individuals are not returned to the population.

  • Sample With Replacement: Selected individuals can be chosen again.

Obtaining a Simple Random Sample

  • Assign each individual a unique number.

  • Select n distinct random numbers.

  • A frame is a list of all individuals in the population.

1.4 Other Effective Sampling Methods

Stratified Sampling

Divide the population into homogeneous groups (strata) and obtain a simple random sample from each stratum.

Systematic Sampling

Select every kth individual from the population, starting at a random number between 1 and k.

Cluster Sampling

Select all individuals within randomly chosen groups (clusters).

Convenience and Voluntary Response Sampling

  • Convenience Sample: Individuals are easily obtained, not random; results are suspect.

  • Voluntary Response Sample: Individuals self-select to participate.

Multistage Sampling

Combines several sampling methods, often used in large-scale surveys.

1.5 Bias in Sampling

Sources of Bias

  • Sampling Bias: Technique favors one part of the population.

  • Nonresponse Bias: Selected individuals who do not respond differ from those who do.

  • Response Bias: Answers do not reflect true feelings due to interviewer error, misrepresented answers, wording, ordering, question type, or data-entry error.

Sampling vs. Nonsampling Error

  • Nonsampling Error: Results from undercoverage, nonresponse, response bias, or data-entry error.

  • Sampling Error: Occurs because a sample gives incomplete information about a population.

1.6 The Design of Experiments

Characteristics of an Experiment

  • Experiment: Controlled study to determine the effect of varying explanatory variables (factors) on a response variable.

  • Treatment: Combination of factor values applied to experimental units.

  • Experimental Unit: Person, object, or item receiving treatment.

  • Control Group: Baseline treatment for comparison.

  • Placebo: Innocuous medication used as a control.

  • Blinding: Nondisclosure of treatment; single-blind (subject unaware), double-blind (subject and researcher unaware).

Steps in Designing an Experiment

  1. Identify the problem and response variable.

  2. Determine factors affecting the response variable.

  3. Determine the number of experimental units.

  4. Determine the level of each factor (control or randomize).

  5. Conduct the experiment (replication, data collection).

  6. Test the claim using inferential statistics.

Completely Randomized Design

Each experimental unit is randomly assigned to a treatment.

Matched-Pairs Design

Experimental units are paired based on related characteristics; each pair receives different treatments.

Randomized Block Design

Experimental units are divided into homogeneous blocks; within each block, units are randomly assigned to treatments.

Pearson Logo

Study Prep