BackIntroduction to Statistics: Key Concepts and Data Types
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Introduction to Statistics
Statistical and Critical Thinking
Statistics is the science of collecting, organizing, analyzing, and interpreting data to make decisions. Critical thinking in statistics involves understanding the context of data, recognizing the limitations of data collection, and making informed judgments based on statistical evidence.
Statistics: The study of methods for collecting, analyzing, interpreting, and presenting empirical data.
Parameter: A numerical measurement describing some characteristic of a population.
Statistic: A numerical measurement describing some characteristic of a sample.
Example: If you measure the average height of all students in a university, that average is a parameter. If you measure the average height of a randomly selected group of students, that average is a statistic.
Types of Data
Data can be classified into two main types: quantitative and categorical. The type of data determines the statistical methods used for analysis.
Quantitative (Numerical) Data: Consist of numbers representing counts or measurements. Examples: The weights of supermodels, the ages of respondents.
Categorical (Qualitative) Data: Consist of names or labels that are not numbers representing counts or measurements. Examples: Gender (male/female) of athletes, shirt numbers on uniforms (as labels).
Working with Quantitative Data
Quantitative data can be further classified as discrete or continuous, depending on the nature of the values they can take.
Discrete Data: Quantitative data where the number of possible values is finite or countable. Example: The number of coin tosses before getting tails.
Continuous Data: Quantitative data with infinitely many possible values, not countable. Example: The lengths of distances from 0 cm to 12 cm.
Levels of Measurement
Data can also be classified by their level of measurement, which determines the types of statistical analyses that are appropriate.
Nominal Level: Data consist of names, labels, or categories only; cannot be ordered. Example: Survey responses: yes, no, undecided.
Ordinal Level: Data can be ordered, but differences between values are not meaningful. Example: Course grades: A, B, C, D, F.
Interval Level: Data can be ordered, differences are meaningful, but there is no natural zero point. Example: Years: 1000, 2000, 1776, 1492.
Ratio Level: Data can be ordered, differences and ratios are meaningful, and there is a natural zero point. Example: Class times: 50 minutes, 100 minutes.
Level | Description | Example |
|---|---|---|
Nominal | Categories only | Yes/No/Undecided |
Ordinal | Categories with order | Course grades |
Interval | Order & meaningful differences, no true zero | Years |
Ratio | Order, meaningful differences & ratios, true zero | Class times |
Big Data and Data Science
Big data refers to data sets that are so large and complex that traditional software tools cannot efficiently analyze them. Data science is an interdisciplinary field that applies statistics, computer science, and software engineering to analyze and interpret big data, often incorporating knowledge from other fields such as sociology or finance.
Missing Data
Missing data can occur in any data set and must be addressed to ensure valid statistical analysis.
Missing Completely at Random (MCAR): The likelihood of a value being missing is independent of its value or any other values in the data set.
Missing Not at Random (MNAR): The missing value is related to the reason it is missing.
Correcting for Missing Data
Delete Cases: Remove all subjects with any missing values from the analysis.
Impute Missing Values: Substitute missing values with estimated or predicted values based on other available data.
