BackDescriptive Statistics and Probability: Measures of Variation, Probability Concepts, and Counting Rules
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Measures of Variation
Range
The range is a simple measure of variation that indicates the spread between the largest and smallest values in a data set.
Definition: The difference between the maximum and minimum data entries in the set.
The data must be quantitative.
Formula:
Variation
Variation describes how data values are spread out or clustered together. Two data sets can have the same mean but different variations.
Greater variation means data values are more spread out.
Example: Two corporations with the same mean starting salary but different spreads (see bar charts for Corporation A and B).
Deviation, Variance, and Standard Deviation
Deviation: The difference between a data entry and the mean of the data set.
Population deviation:
Sample deviation:
Population Variance:
Population Standard Deviation:
Sample Variance:
Sample Standard Deviation:
Observations:
Standard deviation measures the spread of data around the mean.
Standard deviation is always non-negative; it is zero only if all entries are identical.
Larger standard deviation means data are more spread out.
Step-by-Step Calculation: Population Variance & Standard Deviation
Step | In Words | In Symbols |
|---|---|---|
1 | Find the mean of the population data set | |
2 | Find deviation of each entry | |
3 | Square each deviation | |
4 | Add to get the sum of squares | |
5 | Divide by N to get the population variance | |
6 | Find the square root to get the population standard deviation |
Step-by-Step Calculation: Sample Variance & Standard Deviation
Step | In Words | In Symbols |
|---|---|---|
1 | Find the mean of the sample data set | |
2 | Find deviation of each entry | |
3 | Square each deviation | |
4 | Add to get the sum of squares | |
5 | Divide by n - 1 to get the sample variance | |
6 | Find the square root to get the sample standard deviation |
Interpreting Standard Deviation
Standard deviation measures the typical amount an entry deviates from the mean.
Greater spread in data means a larger standard deviation.
Empirical Rule (68-95-99.7 Rule)
For data with a symmetric, bell-shaped distribution:
About 68% of data lie within one standard deviation of the mean.
About 95% within two standard deviations.
About 99.7% within three standard deviations.
Chebyshev's Theorem
For any data set, the proportion of values within k standard deviations (k > 1) of the mean is at least .
For k = 2: at least 75% of data within 2 standard deviations.
For k = 3: at least 88.9% of data within 3 standard deviations.
Standard Deviation for Grouped Data
For frequency distributions, use class midpoints and frequencies:
Where f = frequency of each class.
Coefficient of Variation (CV)
Describes the standard deviation as a percent of the mean.
Population data set:
Sample data set:
Quartiles, Interquartile Range, and Boxplots
Quartiles
Quartiles divide an ordered data set into four equal parts.
Q1: About 25% of data fall on or below Q1.
Q2: Median; about 50% of data fall on or below Q2.
Q3: About 75% of data fall on or below Q3.
Interquartile Range (IQR)
Measures the range of the middle 50% of the data.
Formula:
Using IQR to Identify Outliers
Find Q1 and Q3.
Compute IQR:
Multiply IQR by 1.5:
Subtract from Q1. Data below this are outliers.
Add to Q3. Data above this are outliers.
Box and Whisker Plot
Exploratory data analysis tool that highlights important features of a data set.
Requires the five-number summary:
Minimum entry
First quartile (Q1)
Median (Q2)
Third quartile (Q3)
Maximum entry
Steps to Draw a Box and Whisker Plot
Find the five-number summary.
Construct a horizontal scale for the data range.
Plot the five numbers above the scale.
Draw a box from Q1 to Q3, with a line at Q2 (median).
Draw whiskers from the box to the minimum and maximum entries.
Percentiles and Other Fractiles
Fractiles | Summary | Symbols |
|---|---|---|
Quartiles | Divides data into 4 equal parts | Q1, Q2, Q3 |
Deciles | Divides data into 10 equal parts | D1, D2, D3, ..., D9 |
Percentiles | Divides data into 100 equal parts | P1, P2, ..., P99 |
Percentile of a Data Entry
To find the percentile that corresponds to a specific data entry x:
Percentile of x =
The Standard Score (z-score)
Represents the number of standard deviations a value x falls from the mean μ.
Probability: Basic Concepts and Counting
Probability Experiments
Probability experiment: An action or trial with specific results (counts, measurements, or responses).
Outcome: The result of a single trial.
Sample space: The set of all possible outcomes.
Event: One or more outcomes; a subset of the sample space.
Simple and Compound Events
Simple event: Consists of a single outcome (e.g., tossing heads and rolling a 3).
Compound event: Consists of more than one outcome (e.g., tossing heads and rolling an even number).
The Fundamental Counting Principle
If one event can occur in m ways and a second in n ways, the two events can occur in ways.
Can be extended for more events in sequence.
Types of Probability
Classical (theoretical) probability: Each outcome is equally likely.
Empirical (statistical) probability: Based on observed data.
, where f = frequency of event E, n = total frequency
Subjective probability: Based on intuition, educated guesses, or estimates.
Law of Large Numbers
As an experiment is repeated, the empirical probability approaches the theoretical probability.
Example: Probability of tossing a head approaches 0.5 as the number of tosses increases.
Range of Probabilities Rule
Probability of any event E is between 0 and 1, inclusive:
Complementary Events
The complement of event E (denoted E') is the set of all outcomes not in E.
Conditional Probability and the Multiplication Rule
Conditional Probability
The probability of event B occurring, given that event A has already occurred.
Denoted (read as "probability of B, given A").
Independent and Dependent Events
Independent events: The occurrence of one does not affect the probability of the other.
or
Events that are not independent are dependent.
The Multiplication Rule
For two events A and B, the probability that both occur in sequence:
General rule:
For independent events:
Can be extended for more than two events.