Chapter 1: Introduction to Statistics – Study Notes

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Introduction to Statistics

1-1 Statistical and Critical Thinking

Statistics is the science of collecting, analyzing, presenting, and interpreting data. It is essential for making informed decisions based on data and for drawing conclusions about populations from sample information.

Key Concept: A major use of statistics is to collect and use sample data to make conclusions about populations.

1-2 Types of Data

Understanding the types of data is fundamental in statistics, as it determines the methods of analysis and interpretation.

Parameter: A parameter is a numerical measurement describing some characteristic of a population.
Statistic: A statistic is a numerical measurement describing some characteristic of a sample.

Quantitative Data

Definition: Quantitative (or numerical) data consists of numbers representing counts or measurements.
Examples: The weights of supermodels; the ages of respondents.

Categorical Data

Definition: Categorical (or qualitative or attribute) data consists of names or labels, not numbers that represent counts or measurements.
Examples: The gender (male/female) of professional athletes; shirt numbers on professional athletes' uniforms (as substitutes for names).

Working with Quantitative Data

Quantitative data can be further classified as discrete or continuous, which affects the choice of statistical methods.

Discrete Data

Definition: Discrete data result when the data values are quantitative and the number of values is finite or "countable."
Example: The number of tosses of a coin before getting tails.

Continuous Data

Definition: Continuous (numerical) data result from infinitely many possible quantitative values, where the collection of values is not countable.
Example: The lengths of distances from 0 cm to 12 cm.

1-3 Levels of Measurement

Data can also be classified by their level of measurement, which determines the types of statistical analyses that are appropriate.

Nominal Level: Data consist of names, labels, or categories only. The data cannot be arranged in any order (e.g., survey responses of yes, no, and undecided).
Ordinal Level: Data can be arranged in some order, but differences between values are either not meaningful or cannot be determined (e.g., course grades A, B, C, D, or F).
Interval Level: Data can be arranged in order, and differences between values are meaningful, but there is no natural zero starting point (e.g., years 1000, 2000, 1776, and 1492).
Ratio Level: Data can be arranged in order, differences are meaningful, and there is a natural zero starting point (e.g., class times of 50 minutes and 100 minutes).

Summary Table: Levels of Measurement

Level	Description	Example
Nominal	Categories only	Survey responses (yes, no, undecided)
Ordinal	Categories with some order	Course grades (A, B, C, D, F)
Interval	Differences but no natural zero point	Years (1000, 2000, 1776, 1492)
Ratio	Differences and a natural zero point	Class times (50 min, 100 min)

Big Data and Data Science

Modern statistics often deals with extremely large and complex data sets, requiring advanced computational tools and interdisciplinary approaches.

Big Data: Refers to data sets so large and complex that their analysis is beyond the capabilities of traditional software tools. Analysis may require parallel processing on many computers.
Data Science: Involves applications of statistics, computer science, and software engineering, along with other relevant fields such as sociology or finance.

Missing Data

Missing data can affect the validity of statistical analyses. Understanding the nature of missing data is crucial for choosing appropriate methods to handle it.

Missing Completely at Random (MCAR): The likelihood of a value being missing is independent of its value or any other values in the data set.
Missing Not at Random (MNAR): The missing value is related to the reason that it is missing.

Correcting for Missing Data

Delete Cases: Remove all subjects with any missing values. This is a common but sometimes inefficient method.
Impute Missing Values: Substitute missing data values with estimated or predicted values based on other available information.

Elementary Statistics textbook cover