BackChapter 1: Data Collection – Foundations of Statistics
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Section 1.1 – Statistics Basics
Introduction to Statistics
Statistics is the science of collecting, organizing, summarizing, and analyzing information to draw conclusions or answer questions. It also involves providing a measure of confidence in any conclusion.
Statistic: Facts or data, organized and summarized to provide useful and accessible information about a particular subject.
Statistics: The science of collecting, organizing, summarizing, and analyzing information to draw conclusions or answer questions.
Key Steps in Statistics:
Collection of information
Organization and summarization of information
Analysis of information to draw conclusions or answer questions
Reporting results with measures that represent how confident we are that our conclusions reflect reality
Example: "50% of all marriages end in divorce," "67% of second marriages end in divorce," and "74% of third marriages end in divorce" are statistics that summarize data about marriage outcomes.
Key Definitions
Descriptive Statistics: Methods for organizing and summarizing data.
Inferential Statistics: Methods for drawing and measuring the reliability of conclusions about a population based on a sample.
Population: The entire group to be studied.
Sample: A subset of the population that is actually observed or analyzed.
Observational Studies: Studies where the researcher observes characteristics of a sample without trying to influence the outcome.
Designed Experiments: Studies where the researcher applies treatments and controls to observe their effects on the subjects.
Section 1.2 – Observational Studies vs. Designed Experiments
Experimental Design Principles
Understanding the difference between observational studies and designed experiments is crucial for interpreting statistical results.
Control: Keeping other variables constant to isolate the effect of the treatment.
Randomization: Assigning subjects to treatments using a random process to avoid bias.
Replication: Repeating the experiment on many subjects to ensure results are reliable.
Key Terminology
Experimental Units: The objects or individuals on which the experiment is performed.
Subject: A human experimental unit.
Treatments: The conditions applied to the experimental units.
Response Variable: The outcome measured in the experiment.
Factor: An explanatory variable manipulated by the researcher.
Levels: The different values of a factor.
Completely Randomized Design: All experimental units are assigned to treatments completely at random.
Randomized Block Design: Experimental units are divided into blocks, and within each block, units are randomly assigned to treatments.
Section 1.3 – Simple Random Sampling
Sampling Methods and Terms
Sampling is the process of selecting a subset of individuals from a population to estimate characteristics of the whole population.
Census: A study that attempts to include the entire population.
Representative Sample / Probability Sampling: A sample that accurately reflects the characteristics of the population.
Simple Random Sampling with Replacement: Each member of the population can be selected more than once.
Simple Random Sampling without Replacement: Each member can be selected only once.
Simple Random Sample: Every possible sample of a given size has the same chance of being selected.
Random-Number Tables: Tables of randomly generated numbers used to select samples objectively.
Example Table: Random-Number Table
Random-number tables are used to ensure unbiased selection in simple random sampling. Below is a simplified example:
Line Number | Column 1 | Column 2 | Column 3 |
|---|---|---|---|
1 | 83421 | 92741 | 62431 |
2 | 12753 | 42197 | 31562 |
3 | 98214 | 56321 | 78912 |
Additional info: Actual random-number tables are much larger and used to select samples by matching numbers to individuals in the population.
Section 1.4 – Other Effective Sampling Methods
Alternative Sampling Techniques
Besides simple random sampling, several other methods are used to obtain representative samples, especially in large or complex populations.
Systematic Random Sampling: Selecting every kth individual from a list after a random start.
Cluster Sampling: Dividing the population into clusters (often geographically), then randomly selecting entire clusters for study. Useful in large, spread-out populations.
Stratified Random Sampling with Proportional Allocation: Dividing the population into strata (groups) based on a characteristic (e.g., age, income), then randomly sampling from each stratum proportionally.
Convenience Sampling: Selecting individuals who are easiest to reach. Note: This method is generally not recommended due to potential bias.
Comparison of Sampling Methods
Method | Description | When to Use |
|---|---|---|
Simple Random Sampling | Every member has equal chance of selection | Small, homogeneous populations |
Systematic Sampling | Select every kth member after random start | Ordered lists, large populations |
Cluster Sampling | Randomly select entire groups (clusters) | Large, geographically dispersed populations |
Stratified Sampling | Divide into strata, sample from each | Populations with distinct subgroups |
Convenience Sampling | Sample easiest to reach individuals | Preliminary studies, not for inference |