Back(Lecture 15) Binomial Distribution and Binary Outcomes in Statistics
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Section 6.3: Probabilities When Each Observation Has Two Possible Outcomes
Introduction to Binary Outcomes
Many statistical problems involve situations where each observation can result in one of two possible outcomes. These are called binary outcomes, and the binomial distribution is a key tool for analyzing such data.
Binary data means each trial results in either "success" or "failure".
Examples include:
Accepting or declining a credit card offer
Having or not having health insurance
Voting yes or no on a referendum
Binomial Distribution: Probabilities for Counts with Binary Data
Definition and Conditions
The binomial distribution models the number of successes in a fixed number of independent trials, each with the same probability of success. The following conditions must be met:
Each of n trials has two possible outcomes: success or failure.
Each trial has the same probability of success, denoted by p. The probability of failure is 1 - p.
The n trials are independent; the outcome of one trial does not affect the others.
The binomial random variable x is the number of successes in the n trials.
Binomial Probability Formula
When the number of trials is large, listing all possible outcomes is impractical. The binomial probability formula provides a way to calculate the probability of observing exactly x successes in n independent trials:
n! denotes the factorial of n, which is the product of all positive integers up to n.
p is the probability of success on a single trial.
x is the number of observed successes.
Factorials
Factorials are used in the binomial formula to count the number of ways to arrange successes and failures:
n! = 1 × 2 × 3 × ... × n
0! = 1 (by definition)
Example: 4! = 1 × 2 × 3 × 4 = 24
Applications of the Binomial Distribution
Example 6: Testing for Gender Bias in Promotions
Suppose a group of women employees claims that female employees are less likely than male employees of similar qualifications to be promoted. In a company of 1000 employees (50% female), none of the 10 employees chosen for management training were female. What is the probability of this outcome if selection is random?
This result is very unlikely (one chance in a thousand) if selection is random, suggesting possible bias.
Checking Binomial Conditions
Before using the binomial distribution, verify that its three conditions apply:
Binary data (success or failure)
Same probability of success for each trial (denoted by p)
Independent trials
Example 7: Verifying Binomial Conditions in Gender Bias Study
The data are binary (male, female).
If employees are selected randomly, the probability of selecting a female on any trial is 0.50.
With random sampling from a large population, the outcome for one trial does not affect another.
Mean and Standard Deviation of the Binomial Distribution
Formulas
For a binomial distribution with n trials and probability p of success on each trial, the mean (μ) and standard deviation (σ) are:
μ is the expected number of successes.
σ measures the variability in the number of successes.
Example 8: Checking for Racial Profiling
Context and Data
In 2006, the NYPD confronted approximately 500,000 pedestrians for suspected criminal violations. Of these, 88.9% were non-white, while the city's population was 55.4% non-white.
Let n = 500,000 (number of confrontations).
Let p = 0.554 (probability a randomly selected resident is non-white).
Calculating Mean and Standard Deviation
Using the Empirical Rule
The empirical rule states that for a bell-shaped distribution, nearly all observations fall within three standard deviations of the mean:
Lower bound:
Upper bound:
If the selection were random, we would expect the number of non-white individuals confronted to be between 275,947 and 278,053. However, the observed number was 444,500, much higher than expected, suggesting evidence of racial profiling.
Summary Table: Binomial Distribution Key Properties
Property | Description |
|---|---|
Number of trials (n) | Fixed number of independent trials |
Probability of success (p) | Same for each trial |
Random variable (x) | Number of successes in n trials |
Mean (μ) | |
Standard deviation (σ) | |
Probability formula |
Additional info: The empirical rule is also known as the 68-95-99.7 rule, which states that approximately 68% of data falls within one standard deviation, 95% within two, and 99.7% within three standard deviations of the mean for a normal distribution.