Skip to main content
Back

Introduction to Data Collection in Statistics

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Chapter 1: Data Collection

1.1 Introduction to the Practice of Statistics

Statistics is the science of collecting, organizing, summarizing, and analyzing information to draw conclusions or answer questions. It also involves providing a measure of confidence in any conclusions. The process of statistics is foundational for making informed decisions in various fields such as science, engineering, and business.

  • Statistics: The discipline concerned with data collection, analysis, and interpretation.

  • Information: In statistics, information refers to data that has been processed and organized to be meaningful.

Example: A news service conducts a survey of 1006 adults aged 18 years or older in a certain country, August 2008, and asks whether they favor or oppose increasing the tax on gasoline to reduce dependence on foreign oil. The survey found that 60% opposed the increase. The goal is to use the sample to draw conclusions about the entire population of adults in the country.

  • Population: The entire group of individuals to be studied. Example: All adults aged 18 or older in the country.

  • Individual: A person or object that is a member of the population being studied. Example: One adult surveyed.

  • Sample: A subset of the population being studied. Example: The 1006 adults surveyed.

  • Descriptive Statistics: Consists of organizing and summarizing data, often through numerical summaries, tables, and graphs.

  • Statistic: A numerical measurement describing some characteristic of a sample.

Additional info: The distinction between population and sample is crucial for understanding how statistical inference works.

Types of Studies in Statistics

Designed Experiments

In a designed experiment, a researcher assigns individuals in a study to certain groups, intentionally changes the value of the explanatory variable, and records the value of the response variable for each group.

  • Pros: Can show cause and effect relationships; control over variables.

  • Cons: May be costly, time-consuming, or ethically challenging.

Confounding and Lurking Variables

  • Confounding Variable: Occurs when the effects of two or more explanatory variables are not separated, making it unclear which variable is responsible for changes in the response variable.

  • Lurking Variable: An explanatory variable that was not considered in a study but affects the value of the response variable. Lurking variables are typically related to both the explanatory and response variables.

Example: In a study examining the effect of exercise on weight loss, diet may be a lurking variable if not controlled.

Types of Observational Studies

Observational studies involve collecting data without influencing the variables being measured. They are useful for identifying associations but cannot establish causation.

  • Cross-Sectional Study: Collects information about individuals at a specific point in time or over a very short period. Pros: Quick, inexpensive. Cons: Cannot establish causality; may not reflect changes over time.

  • Case-Controlled Study (Retrospective Study): These studies are retrospective, requiring researchers to look at existing records. Individuals may be grouped by outcome. Pros: Useful for rare conditions; relatively quick. Cons: May rely on memory or existing records, which can be incomplete or biased.

Summary Table: Types of Studies

Type of Study

Description

Pros

Cons

Designed Experiment

Researcher manipulates variables and assigns groups

Can show causation; control over variables

Costly; ethical issues

Cross-Sectional Study

Data collected at one point in time

Quick; inexpensive

No causality; snapshot only

Case-Controlled Study

Retrospective; groups by outcome

Good for rare events; quick

Recall bias; incomplete records

Key Formulas and Definitions

  • Statistic:

  • Population Parameter: Additional info: Parameters are typically unknown and estimated using statistics from samples.

Pearson Logo

Study Prep