BackBasic Statistics and Data Science: Exam Study Guide and Practice Questions
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Exam Overview and Structure
This study guide covers the foundational topics in basic statistics and data science, as outlined for MATH 012. It includes exam structure, key concepts, formulas, and practice questions to help students prepare for assessments covering Chapters 1–4.
Exam Format
Duration: 90 minutes
Coverage: Chapters 1–4
Sections: Multiple choice (36 points), Short answer (64 points)
Calculator: Allowed for basic arithmetic (show detailed work for credit)
Formula Sheet: Provided during the exam
Chapter 1: Introduction to Statistics
Key Concepts
Statistics: The science of collecting, analyzing, interpreting, and presenting data.
Population: The entire group of individuals or items under study.
Sample: A subset of the population selected for analysis.
Variable: A characteristic or attribute that can assume different values.
Descriptive Statistics: Methods for summarizing and organizing data.
Inferential Statistics: Methods for making predictions or inferences about a population based on sample data.
Sampling Methods
Simple Random Sampling
Stratified Sampling
Cluster Sampling
Systematic Sampling
Additional info: Sampling methods are crucial for ensuring representative data and minimizing bias.
Chapter 2: Types of Data and Data Visualization
Types of Variables
Categorical (Qualitative): Data that can be grouped into categories (e.g., gender, color).
Numerical (Quantitative): Data that can be measured or counted (e.g., height, age).
Discrete: Countable values (e.g., number of students).
Continuous: Any value within a range (e.g., weight, temperature).
Data Tables and Frequency Distributions
Construct and interpret frequency tables for categorical and numerical data.
Use bar charts, Pareto charts, and pie charts for categorical data.
Use histograms, dot plots, and stem-and-leaf plots for numerical data.
Graphs of Numerical Data
Histogram: Displays the distribution of numerical data using bars.
Dot Plot: Shows individual data points.
Boxplot: Summarizes data using quartiles and identifies outliers.
Measures of Central Tendency
Mean (Average):
Median: The middle value when data are ordered.
Mode: The most frequently occurring value.
Measures of Variability
Variance:
Standard Deviation:
Range: Difference between maximum and minimum values.
Interquartile Range (IQR):
Empirical Rule (68-95-99.7 Rule)
For bell-shaped distributions:
~68% of data within 1 standard deviation of the mean
~95% within 2 standard deviations
~99.7% within 3 standard deviations
z-score
Measures how many standard deviations a value is from the mean.
Formula:
Used to detect outliers (typically, or indicates a potential outlier).
Linear Transformations
Transforming data using affects mean and standard deviation.
Mean:
Standard deviation:
Chapters 3 & 4: Relationships Between Variables
Contingency Tables
Used to summarize the relationship between two categorical variables.
Conditional proportions help identify associations.
Scatterplots
Graphical representation of the relationship between two quantitative variables.
Can reveal patterns, trends, and potential outliers.
Correlation Coefficient ()
Measures the strength and direction of a linear relationship between two variables.
Formula:
Range:
Correlation does not imply causation.
Linear Regression
Models the relationship between a response variable and an explanatory variable .
Regression line:
Slope:
Intercept:
Coefficient of determination (): Proportion of variance in explained by .
Residual: Difference between observed and predicted value ()
Simpson's Paradox
When a trend appears in several groups of data but disappears or reverses when the groups are combined.
Experimental vs. Observational Studies
Experimental Study: Researcher manipulates variables and controls conditions.
Observational Study: Researcher observes without intervention.
Bias and Randomization
Types of bias: sampling bias, confounding bias, non-response bias, response bias.
Randomization helps ensure representative samples and valid inferences.
Double-blind: Neither participants nor experimenters know group assignments.
Practice Questions and Worked Examples
Sample Multiple Choice Questions
Definition and identification of statistics, variables, and samples.
Interpretation of bar graphs, histograms, and Pareto charts.
Understanding sample variance and its properties.
Boxplot interpretation and what can/cannot be displayed.
Classification of variables as numerical/continuous or categorical/discrete.
Properties of bivariate data and correlation.
Interpretation of histograms and measures of central tendency.
Types of bias in survey questions.
Properties of the correlation coefficient .
Application of linear transformations and calculation of mean and standard deviation.
Worked Example: Contingency Table
The following table summarizes computer sales by age group and computer type:
Age group | PC | Mac | Total |
|---|---|---|---|
20-30 | 70 | 30 | 100 |
30-40 | 50 | 50 | 100 |
Total | 120 | 80 | 200 |
Additional info: Conditional proportions and associations can be calculated from this table to determine if age group and computer type are associated.
Worked Example: Five-Number Summary and Boxplot
Given data: min = 6, max = 27, Q1, Q2, Q3 to be calculated.
Five-number summary: Minimum, Q1, Median (Q2), Q3, Maximum
IQR:
Outliers: Any value below or above
Worked Example: z-score and Outlier Detection
z-score formula:
Interpretation: z-scores beyond ±2 or ±3 may indicate outliers.
Worked Example: Linear Regression and Correlation
Regression equation:
Interpret slope: Change in for a one-unit increase in .
Calculate predicted value and residual for a given .
Correlation coefficient calculation using sample data.
Key Formulas Reference
Sample Mean:
Sample Variance:
z-score:
Correlation Coefficient:
Regression Line:
Slope:
Intercept:
Additional info: Mastery of these concepts and formulas is essential for success in introductory statistics and data science courses.