Skip to main content
Back

Foundations of Statistics: Data Collection, Organization, and Summarization

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Introduction to Statistics

Key Definitions and Concepts

Statistics is the science of collecting, organizing, analyzing, and interpreting data to make decisions. Understanding the foundational vocabulary and processes is essential for further study in statistics.

  • Statistics: The science of learning from data, including collection, analysis, interpretation, and presentation.

  • Data: Observations or measurements collected for analysis.

  • Population: The entire group of individuals or items of interest.

  • Parameter: A numerical summary describing a characteristic of a population.

  • Sample: A subset of the population selected for study.

  • Statistic: A numerical summary describing a characteristic of a sample.

  • Descriptive Statistics: Methods for summarizing and organizing data (e.g., graphs, averages).

  • Inferential Statistics: Methods for making predictions or inferences about a population based on sample data.

The Process of Statistics

Four-Step Process

The statistical process involves a systematic approach to answering questions using data.

  1. Identify the research objective.

  2. Collect the data needed to answer the question.

  3. Describe the data (organize and summarize).

  4. Draw inferences from the data and make decisions or predictions.

Types of Variables

  • Quantitative Variables: Variables that take numerical values and can be measured or counted (e.g., height, weight).

  • Categorical (Qualitative) Variables: Variables that describe categories or groups (e.g., gender, color).

  • Discrete Variables: Quantitative variables with a finite or countable number of values (e.g., number of students).

  • Continuous Variables: Quantitative variables with an infinite number of possible values within a range (e.g., temperature).

Data Collection: Observational Studies and Experiments

Types of Studies

  • Observational Study: Observes individuals and measures variables without influencing responses.

  • Experiment: Deliberately imposes treatments to observe responses and determine cause-and-effect relationships.

Sampling Methods

  • Random Sampling: Every member of the population has an equal chance of being selected.

  • Simple Random Sampling: Every possible sample of a given size has the same chance of being chosen.

  • Cluster Sampling: The population is divided into clusters, some clusters are randomly selected, and all members of chosen clusters are surveyed.

  • Stratified Sampling: The population is divided into strata (groups), and random samples are taken from each stratum.

  • Voluntary Response Sampling: Individuals choose to participate.

Bias and Confounding

  • Bias: Systematic error that leads to incorrect conclusions.

  • Blinding: Subjects do not know which treatment they receive.

  • Double Blinding: Neither subjects nor experimenters know which treatment is given.

  • Confounding Variables: Variables that affect both the explanatory and response variables, potentially distorting results.

  • Placebo Effect: Improvement due to the belief in the treatment rather than the treatment itself.

Organizing and Summarizing Data

Graphical Methods for Categorical Data

  • Bar Graph: Displays frequencies or proportions for categories.

  • Pie Chart: Shows proportions of categories as sectors of a circle.

  • Side-by-Side Bar Graph: Compares two or more groups across categories.

Graphical Methods for Quantitative Data

  • Histogram: Shows the distribution of quantitative data using bars.

  • Dot Plot: Displays individual data points along a number line.

  • Stem-and-Leaf Plot: Organizes data to show shape and individual values.

Frequency Tables

Frequency tables summarize data by showing the number of observations in each category or interval.

Category

Frequency

Never

118

Rarely

249

Most of the time

145

Always

168

Don't Drive

249

Numerically Summarizing Data

Measures of Center

  • Mean (Average): The sum of all data values divided by the number of values. Formula:

  • Median: The middle value when data are ordered.

  • Mode: The value(s) that occur most frequently.

Measures of Spread

  • Range: Difference between the maximum and minimum values.

  • Standard Deviation: Measures the average distance of data points from the mean. Sample formula:

  • Variance: The square of the standard deviation. Sample formula:

Describing Distributions and Identifying Outliers

Percentiles, Z-scores, and the 5-Number Summary

  • Percentile: The value below which a given percentage of observations fall.

  • Z-score: The number of standard deviations a value is from the mean. Formula:

  • 5-Number Summary: Minimum, Q1 (first quartile), Median, Q3 (third quartile), Maximum.

Box Plots and Outliers

  • Box Plot: A graphical display of the 5-number summary.

  • Outliers: Values that fall outside the lower and upper fences. Lower Fence: Upper Fence: where

Application Example: Can Joy Detect Parkinson’s Disease?

Statistical Reasoning in Experimental Design

This example explores hypothesis testing and the role of probability in evaluating experimental results.

  • Context: Joy Milne participated in a study to detect Parkinson’s disease by smell, identifying 11 out of 12 t-shirts correctly.

  • Key Questions:

    • How likely is it to get 11 or more correct by random guessing?

    • Does this provide convincing evidence that Joy can detect Parkinson’s by smell?

  • Simulation: Repeating the experiment by random guessing helps estimate the probability of such a result occurring by chance.

  • Dot Plot: Used to visualize the distribution of correct guesses under random guessing.

Example: If the probability of getting 11 or more correct by chance is very low, this supports the claim that Joy’s ability is not due to luck.

Summary Table: Measures of Center and Spread

Measure

Sample

Population

Mean

\( \bar{x} \)

\( \mu \)

Standard Deviation

\( s \)

\( \sigma \)

Variance

\( s^2 \)

\( \sigma^2 \)

Conclusion

These notes provide a foundation in the core concepts of statistics, including data collection, organization, and summarization. Mastery of these topics is essential for understanding more advanced statistical inference and analysis.

Pearson Logo

Study Prep