Skip to main content
Back

Chapter 1: Data Collection – Foundations of Statistics

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Section 1.1 – Statistics Basics

Introduction to Statistics

Statistics is the science of collecting, organizing, summarizing, and analyzing information to draw conclusions or answer questions. It also involves providing a measure of confidence in any conclusion.

  • Statistic: Facts or data, organized and summarized to provide useful and accessible information about a particular subject.

  • Statistics: The science of collecting, organizing, summarizing, and analyzing information to draw conclusions or answer questions.

  • Key Steps in Statistics:

    • Collection of information

    • Organization and summarization of information

    • Analysis of information to draw conclusions or answer questions

    • Reporting results with measures that represent how confident we are that our conclusions reflect reality

Example: "50% of all marriages end in divorce," "67% of second marriages end in divorce," and "74% of third marriages end in divorce" are statistics that summarize data about marriage outcomes.

Key Definitions

  • Descriptive Statistics: Methods for organizing and summarizing data.

  • Inferential Statistics: Methods for drawing and measuring the reliability of conclusions about a population based on a sample.

  • Population: The entire group to be studied.

  • Sample: A subset of the population that is actually observed or analyzed.

  • Observational Studies: Studies where the researcher observes characteristics of a sample without trying to influence the outcome.

  • Designed Experiments: Studies where the researcher applies treatments and controls to observe their effects on the subjects.

Section 1.2 – Observational Studies vs. Designed Experiments

Experimental Design Principles

Understanding the difference between observational studies and designed experiments is crucial for interpreting statistical results.

  • Control: Keeping other variables constant to isolate the effect of the treatment.

  • Randomization: Assigning subjects to treatments using a random process to avoid bias.

  • Replication: Repeating the experiment on many subjects to ensure results are reliable.

Key Terminology

  • Experimental Units: The objects or individuals on which the experiment is performed.

  • Subject: A human experimental unit.

  • Treatments: The conditions applied to the experimental units.

  • Response Variable: The outcome measured in the experiment.

  • Factor: An explanatory variable manipulated by the researcher.

  • Levels: The different values of a factor.

  • Completely Randomized Design: All experimental units are assigned to treatments completely at random.

  • Randomized Block Design: Experimental units are divided into blocks, and within each block, units are randomly assigned to treatments.

Section 1.3 – Simple Random Sampling

Sampling Methods and Terms

Sampling is the process of selecting a subset of individuals from a population to estimate characteristics of the whole population.

  • Census: A study that attempts to include the entire population.

  • Representative Sample / Probability Sampling: A sample that accurately reflects the characteristics of the population.

  • Simple Random Sampling with Replacement: Each member of the population can be selected more than once.

  • Simple Random Sampling without Replacement: Each member can be selected only once.

  • Simple Random Sample: Every possible sample of a given size has the same chance of being selected.

  • Random-Number Tables: Tables of randomly generated numbers used to select samples objectively.

Example Table: Random-Number Table

Random-number tables are used to ensure unbiased selection in simple random sampling. Below is a simplified example:

Line Number

Column 1

Column 2

Column 3

1

83421

92741

62431

2

12753

42197

31562

3

98214

56321

78912

Additional info: Actual random-number tables are much larger and used to select samples by matching numbers to individuals in the population.

Section 1.4 – Other Effective Sampling Methods

Alternative Sampling Techniques

Besides simple random sampling, several other methods are used to obtain representative samples, especially in large or complex populations.

  • Systematic Random Sampling: Selecting every kth individual from a list after a random start.

  • Cluster Sampling: Dividing the population into clusters (often geographically), then randomly selecting entire clusters for study. Useful in large, spread-out populations.

  • Stratified Random Sampling with Proportional Allocation: Dividing the population into strata (groups) based on a characteristic (e.g., age, income), then randomly sampling from each stratum proportionally.

  • Convenience Sampling: Selecting individuals who are easiest to reach. Note: This method is generally not recommended due to potential bias.

Comparison of Sampling Methods

Method

Description

When to Use

Simple Random Sampling

Every member has equal chance of selection

Small, homogeneous populations

Systematic Sampling

Select every kth member after random start

Ordered lists, large populations

Cluster Sampling

Randomly select entire groups (clusters)

Large, geographically dispersed populations

Stratified Sampling

Divide into strata, sample from each

Populations with distinct subgroups

Convenience Sampling

Sample easiest to reach individuals

Preliminary studies, not for inference

Pearson Logo

Study Prep