Skip to main content
Back

Introduction to Statistics: Foundations, Data Types, and the Statistical Process

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Introduction to Statistics

What is Statistics?

Statistics is the discipline concerned with collecting, organizing, analyzing, and interpreting data to make decisions or draw conclusions under uncertainty. It provides tools for describing data and making inferences about populations based on samples.

  • Definition: Statistics is the science of learning from data to describe patterns and make decisions under uncertainty.

  • Descriptive statistics: Methods for summarizing and visualizing data (e.g., means, histograms).

  • Inferential statistics: Methods for generalizing from a sample to a population and quantifying uncertainty.

  • Population: The full group or process we want to understand.

  • Sample: The subset we actually observe and measure.

  • Parameter: A (usually unknown) number that describes a population.

  • Statistic: A number computed from a sample, used to estimate a parameter.

  • Bias: Systematic deviation caused by design, comparison, or selection issues.

Example: Estimating the average battery life of a brand of batteries by measuring a sample of 10 batteries and using the sample mean to estimate the population mean.

Populations and Samples

Defining Populations and Samples

Understanding the difference between a population and a sample is fundamental in statistics. The population is the entire group of interest, while a sample is a subset selected for measurement.

  • Population: The group you truly care about (e.g., all patients with a disease).

  • Sample: The group you can practically reach or measure (e.g., patients in a specific hospital).

  • Sampling frame: The actual list or mechanism from which the sample is drawn.

  • Census: An attempt to measure every unit in the population.

  • Parameter vs. Statistic: Parameters describe populations; statistics describe samples.

Example: Measuring the average weight of students in a school by selecting a random sample of students and calculating the sample mean.

Variables and Types of Data

What is a Variable?

A variable is any characteristic that can take on different values among units in a dataset. Variables are classified into two broad families: categorical and quantitative.

  • Categorical variables: Place each case into a group or label (e.g., blood type, gender, home state).

  • Quantitative variables: Return a number where arithmetic is meaningful (e.g., height, test scores).

Example: Blood type (A, B, AB, O) is categorical; height (in cm) is quantitative.

Measurement Scales

Variables can be measured on different scales, which affect the types of analyses that are appropriate.

  • Nominal: Categories without order (e.g., gender, blood type).

  • Ordinal: Categories with a meaningful order but not equal spacing (e.g., Likert ratings).

  • Interval: Quantitative scale with arbitrary zero; differences are meaningful (e.g., temperature in Celsius).

  • Ratio: Quantitative scale with a true zero; ratios and differences are meaningful (e.g., height, weight).

Example: Temperature in Celsius is interval; weight in kilograms is ratio.

Discrete vs. Continuous Variables

Quantitative variables can be further classified as discrete (countable values) or continuous (measured on a scale).

  • Discrete: Countable values (e.g., number of ER visits).

  • Continuous: Measured on a smooth scale (e.g., blood pressure, time to failure).

Example: Number of missed classes is discrete; systolic blood pressure is continuous.

The Role of Statistics in Research

From Question to Estimation to Design

Statistics helps researchers move from formulating questions to designing studies and analyzing data. Every project should address:

  • Target question: What is the main question the study seeks to answer?

  • Target parameter: What is the population value of interest?

  • Design and measurement: How will data be collected and measured?

  • Analysis and communication: How will results be summarized and shared?

Example: Designing a study to compare the effectiveness of two treatments by randomly assigning patients and measuring outcomes.

The Statistical Process

Steps in the Statistical Process

The statistical process provides a blueprint for conducting research and making decisions based on data.

  1. Identify the question: Define the population, observational unit, and variable.

  2. Design the study: Choose an observational or experimental approach, plan the sample, and anticipate sources of bias.

  3. Collect data: Gather measurements using reliable instruments and avoid confounders.

  4. Explore and summarize: Use descriptive statistics and visualizations to understand patterns.

  5. Infer and communicate: Draw conclusions and report uncertainty using confidence intervals and p-values.

Example: Conducting a survey to estimate average study hours among college students, summarizing results, and reporting confidence intervals.

Tables

Key Definitions Table

Keyword

Definition

Statistics

The discipline of learning from data to describe patterns and make decisions under uncertainty.

Descriptive statistics

Methods for summarizing and visualizing data.

Inferential statistics

Methods for generalizing from a sample to a population and quantifying uncertainty.

Population

The full group or process we want to understand.

Sample

The subset we actually observe and measure.

Parameter

A numerical characteristic of a population (e.g., μ, σ).

Statistic

A numerical summary from a sample (e.g., x̄, s).

Sampling frame

Actual list or mechanism from which the sample is drawn.

Bias

Systematic deviation caused by design, comparison, or selection issues.

Observational unit

The 'one case' a single row in data represents.

Measurement Scales Table

Scale

Description

Nominal

Categories without order.

Ordinal

Categories with meaningful order but not equal spacing.

Interval

Quantitative scale with arbitrary zero; differences are meaningful.

Ratio

Quantitative scale with a true zero; ratios and differences are meaningful.

Formulas and Equations

  • Sample mean:

  • Population mean:

  • Sample variance:

  • Population variance:

Additional info:

  • Examples and applications are expanded for clarity and completeness.

  • Tables are recreated and summarized for study purposes.

  • Key terms and definitions are included for self-contained review.

Pearson Logo

Study Prep