Skip to main content
Back

Statistics Midterm 1: Key Concepts and Applications

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Describing Data with Tables and Graphs

Boxplots and Distribution Shape

Boxplots are graphical representations that summarize the distribution of a dataset using five-number summaries: minimum, first quartile (Q1), median, third quartile (Q3), and maximum. They are useful for identifying the shape, center, and spread of the data, as well as potential outliers.

  • Skewness: If the boxplot shows a longer whisker on one side, the distribution is skewed in that direction (right or left).

  • Outliers: Points plotted outside the whiskers indicate outliers.

  • Example: A boxplot of 'tempest' variable shows the distribution is skewed to the right, with an outlier present.

Scatterplots and Correlation

Scatterplots display the relationship between two quantitative variables. The pattern of points can reveal the direction, form, and strength of the association.

  • Negative Correlation: If the points slope downward from left to right, the variables are negatively correlated.

  • Strength: The closer the points are to a straight line, the stronger the correlation.

  • Example: A scatterplot of car weight and mpg shows a negative correlation; as weight increases, mpg decreases.

Describing Data Numerically

Measures of Center and Spread

Numerical summaries help describe the central tendency and variability of data.

  • Mean: The arithmetic average of the data.

  • Median: The middle value when data are ordered.

  • Standard Deviation: Measures the average distance of data points from the mean.

  • Example: Calculating the mean and standard deviation for car weights and mpg values.

Types of Variables

Variables can be classified based on their nature and measurement scale.

  • Quantitative Variable: Takes numerical values (e.g., weight, mpg).

  • Categorical Variable: Takes categories or labels (e.g., car model, state).

  • Example: 'Driver's license number' is a categorical/qualitative nominal variable.

Regression

Simple Linear Regression

Regression analysis estimates the relationship between a dependent variable and one or more independent variables. In simple linear regression, the model is:

  • Equation:

  • Interpretation: The slope indicates the change in for a one-unit increase in .

  • Example: For car data, the regression line predicts mpg from weight. A negative slope means heavier cars have lower mpg.

Coefficient of Determination ()

The value measures the proportion of variance in the dependent variable explained by the independent variable(s).

  • High : Indicates the model explains a large portion of the variability.

  • Limitation: A high does not imply causation or appropriateness of the model.

  • Example: An of 75.3% means 75.3% of the variability in mpg is explained by weight.

Probability

Basic Probability Concepts

Probability quantifies the likelihood of events occurring. Key concepts include independent and dependent events, conditional probability, and the addition/multiplication rules.

  • Independent Events: The occurrence of one event does not affect the probability of another.

  • Dependent Events: The occurrence of one event affects the probability of another.

  • Conditional Probability: The probability of event A given event B has occurred, denoted .

  • Example: Calculating the probability a student chooses the correct answer without cheating, using .

Useful Probability Formulas

Several formulas are essential for solving probability problems:

  • Addition Rule:

  • Multiplication Rule:

  • Conditional Probability:

  • Independence: If A and B are independent,

Probability Table

The following table summarizes key probability relationships:

Rule

Formula

Addition Rule

Multiplication Rule

Conditional Probability

Independence

Additional info:

  • Some formulas for mean and standard deviation were provided: and .

  • Contextual examples are based on car weight and mpg data, as used in the exam questions.

  • Probability questions included scenarios about cheating and correct answers, illustrating conditional and joint probabilities.

Pearson Logo

Study Prep