Introduction to Statistics: Key Concepts and Data Types

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Introduction to Statistics

Statistical and Critical Thinking

Statistics is the science of collecting, organizing, analyzing, and interpreting data to make decisions. Critical thinking in statistics involves understanding the context of data, recognizing the limitations of data collection, and making informed judgments based on statistical evidence.

Statistics: The study of methods for collecting, analyzing, interpreting, and presenting empirical data.
Parameter: A numerical measurement describing some characteristic of a population.
Statistic: A numerical measurement describing some characteristic of a sample.

Example: If you measure the average height of all students in a university, that average is a parameter. If you measure the average height of a randomly selected group of students, that average is a statistic.

Types of Data

Data can be classified into two main types: quantitative and categorical. The type of data determines the statistical methods used for analysis.

Quantitative (Numerical) Data: Consist of numbers representing counts or measurements. Examples: The weights of supermodels, the ages of respondents.
Categorical (Qualitative) Data: Consist of names or labels that are not numbers representing counts or measurements. Examples: Gender (male/female) of athletes, shirt numbers on uniforms (as labels).

Working with Quantitative Data

Quantitative data can be further classified as discrete or continuous, depending on the nature of the values they can take.

Discrete Data: Quantitative data where the number of possible values is finite or countable. Example: The number of coin tosses before getting tails.
Continuous Data: Quantitative data with infinitely many possible values, not countable. Example: The lengths of distances from 0 cm to 12 cm.

Levels of Measurement

Data can also be classified by their level of measurement, which determines the types of statistical analyses that are appropriate.

Nominal Level: Data consist of names, labels, or categories only; cannot be ordered. Example: Survey responses: yes, no, undecided.
Ordinal Level: Data can be ordered, but differences between values are not meaningful. Example: Course grades: A, B, C, D, F.
Interval Level: Data can be ordered, differences are meaningful, but there is no natural zero point. Example: Years: 1000, 2000, 1776, 1492.
Ratio Level: Data can be ordered, differences and ratios are meaningful, and there is a natural zero point. Example: Class times: 50 minutes, 100 minutes.

Level	Description	Example
Nominal	Categories only	Yes/No/Undecided
Ordinal	Categories with order	Course grades
Interval	Order & meaningful differences, no true zero	Years
Ratio	Order, meaningful differences & ratios, true zero	Class times

Big Data and Data Science

Big data refers to data sets that are so large and complex that traditional software tools cannot efficiently analyze them. Data science is an interdisciplinary field that applies statistics, computer science, and software engineering to analyze and interpret big data, often incorporating knowledge from other fields such as sociology or finance.

Missing Data

Missing data can occur in any data set and must be addressed to ensure valid statistical analysis.

Missing Completely at Random (MCAR): The likelihood of a value being missing is independent of its value or any other values in the data set.
Missing Not at Random (MNAR): The missing value is related to the reason it is missing.

Correcting for Missing Data

Delete Cases: Remove all subjects with any missing values from the analysis.
Impute Missing Values: Substitute missing values with estimated or predicted values based on other available data.

Elementary Statistics textbook cover