Skip to main content
Back

Foundations of Statistics: Data Collection, Organization, and Summarization

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Introduction to the Practice of Statistics

Definitions and Key Concepts

Statistics is the science of collecting, organizing, analyzing, and interpreting data to make informed decisions. Understanding the foundational terminology is essential for further study in statistics.

  • Data: Information from observations, counts, measurements, or responses.

  • Population: The entire group being studied.

  • Sample: A subset of the population selected for analysis.

  • Parameter: A numerical description of a population characteristic.

  • Statistic: A numerical description of a sample characteristic.

  • Qualitative Variable: A variable that categorizes or describes an element of a population.

  • Quantitative Variable: A variable that quantifies an element of a population.

  • Discrete Variable: A quantitative variable with a countable number of values.

  • Continuous Variable: A quantitative variable with an infinite number of possible values within a range.

Example: Surveying 100 students about their favorite color (qualitative) or their height (quantitative).

Distinguishing Data Sets

  • Population Data Set: Includes all outcomes, responses, measurements, or counts of interest.

  • Sample Data Set: Includes only part of the population.

Example: Measuring the average height of all students in a school (population) vs. a selected class (sample).

Parameters vs. Statistics

  • Parameter: Describes a characteristic of a population.

  • Statistic: Describes a characteristic of a sample.

Example: The mean age of all U.S. senators (parameter) vs. the mean age of a sample of 10 senators (statistic).

Qualitative vs. Quantitative Data

  • Qualitative: Descriptive, non-numeric (e.g., colors, labels).

  • Quantitative: Numeric, measurable (e.g., height, weight).

Discrete vs. Continuous Variables

  • Discrete: Countable values (e.g., number of students).

  • Continuous: Infinite possible values within a range (e.g., height, time).

Observational Studies vs. Designed Experiments

Definitions

  • Observational Study: Observes outcomes without influencing them.

  • Experiment: Applies a treatment and observes its effect.

  • Confounding: Occurs when the effects of multiple variables are mixed.

  • Lurking Variable: An unmeasured variable that affects the outcome.

Example: Studying the effect of a new drug (experiment) vs. observing health outcomes in a population (observational study).

Sampling Methods

Simple Random Sample

  • Each member of the population has an equal chance of being selected.

  • Use random number tables or generators to select samples.

Other Sampling Methods

  • Stratified Sampling: Divides the population into strata and samples from each.

  • Systematic Sampling: Selects every k-th individual from a list.

  • Cluster Sampling: Divides the population into clusters, then randomly selects clusters.

  • Convenience Sampling: Uses readily available subjects (not recommended for valid inference).

How to Do a Systematic Sample

  1. Know the sample size and population size.

  2. Pick a random starting point.

  3. Select every k-th individual.

Bias in Sampling

Types of Bias

  • Sampling Bias: Sample does not represent the population.

  • Nonresponse Bias: Selected individuals do not respond.

  • Response Bias: Survey answers do not reflect true feelings.

  • Data-Entry Error: Mistakes in recording data.

  • Funding Bias: Results influenced by the source of funding.

Design of Experiments

Characteristics of an Experiment

  • Controlled study with treatments applied to experimental units.

  • May include control groups, blinding, and randomization.

Steps in Conducting an Experiment

  1. Identify the problem.

  2. Determine factors affecting the response variable.

  3. Determine the number of experimental units.

  4. Assign treatments to units (randomized or block design).

  5. Conduct the experiment and collect data.

Randomized and Block Designs

  • Randomized Design: Units are randomly assigned to treatments.

  • Block Design: Units are grouped by similarity, then randomized within blocks.

Organizing Qualitative Data

Frequency Distributions

  • Lists each category and the number of occurrences (frequency).

  • Relative frequency is the proportion of observations in each category.

Formula:

Color

Tally

Frequency

Relative Frequency

Brown

|||| ||||

12

0.2667

Yellow

|||| |

10

0.2222

Red

||||

9

0.2

Orange

||

6

0.1333

Blue

|

3

0.0667

Graphs for Qualitative Data

  • Bar Graph: Rectangles represent frequencies for each category.

  • Pareto Chart: Bars in decreasing order of frequency.

  • Pie Chart: Circle divided into sectors proportional to category frequencies.

Organizing Quantitative Data: Popular Displays

Frequency Distributions and Histograms

  • Frequency Distribution: Table showing classes or intervals and their frequencies.

  • Histogram: Rectangles for each class, height represents frequency or relative frequency.

  • Dot Plot: Dots above a number line for each data point.

Shapes of Distributions

  • Uniform: Evenly spread frequencies.

  • Bell-shaped (Symmetric): Highest frequency in the middle.

  • Skewed Right: Tail on the right.

  • Skewed Left: Tail on the left.

Additional Displays of Quantitative Data

Stem-and-Leaf Plot

  • Shows data values while preserving the original data.

Frequency Polygon

  • Uses points connected by lines to represent class frequencies.

Ogive Graph

  • Represents cumulative frequency or cumulative relative frequency.

Time Series Graph

  • Plots data measured at successive points in time.

Graphical Misrepresentations of Data

Guidelines for Good Graphics

  • Label axes and provide units.

  • Avoid distortion and misleading scales.

  • Use appropriate graph types for the data.

  • Avoid three-dimensional effects that distort perception.

Measures of Central Tendency

Mean

  • Population Mean:

  • Sample Mean:

Median

  • The middle value when data are ordered.

Mode

  • The value that occurs most frequently.

Measures of Dispersion

Definitions

  • Range: Max value minus min value.

  • Variance: Average of squared deviations from the mean.

  • Standard Deviation: Square root of the variance.

Population Variance:

Sample Variance:

Standard Deviation: or

Empirical Rule

  • For bell-shaped distributions:

  • ~68% of data within 1 standard deviation of the mean

  • ~95% within 2 standard deviations

  • ~99.7% within 3 standard deviations

Measures of Position

Z-scores

  • Represents the number of standard deviations a value is from the mean.

Formula:

Percentiles and Quartiles

  • Percentile: Value below which a given percentage of observations fall.

  • Quartiles: Divide data into four equal parts (Q1, Q2, Q3).

Interquartile Range (IQR)

  • Range of the middle 50% of the data:

Outliers

  • Values outside are considered outliers.

The Five-Number Summary and Boxplots

  • Consists of minimum, Q1, median, Q3, and maximum.

  • Boxplots visually display the five-number summary and identify outliers.

Example: Drawing a boxplot for exam scores using the five-number summary: 60, 68, 77, 89, 98.

Additional info: These notes provide a comprehensive overview of the foundational concepts in statistics, including data collection, sampling, bias, experimental design, and the organization and summarization of data. They are suitable for exam preparation and as a reference for introductory statistics courses.

Pearson Logo

Study Prep