(Lecture 15) Binomial Distribution and Binary Outcomes in Statistics

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Section 6.3: Probabilities When Each Observation Has Two Possible Outcomes

Introduction to Binary Outcomes

Many statistical problems involve situations where each observation can result in one of two possible outcomes. These are called binary outcomes, and the binomial distribution is a key tool for analyzing such data.

Binary data means each trial results in either "success" or "failure".
Examples include:
- Accepting or declining a credit card offer
- Having or not having health insurance
- Voting yes or no on a referendum

Binomial Distribution: Probabilities for Counts with Binary Data

Definition and Conditions

The binomial distribution models the number of successes in a fixed number of independent trials, each with the same probability of success. The following conditions must be met:

Each of n trials has two possible outcomes: success or failure.
Each trial has the same probability of success, denoted by p. The probability of failure is 1 - p.
The n trials are independent; the outcome of one trial does not affect the others.
The binomial random variable x is the number of successes in the n trials.

Binomial Probability Formula

When the number of trials is large, listing all possible outcomes is impractical. The binomial probability formula provides a way to calculate the probability of observing exactly x successes in n independent trials:

n! denotes the factorial of n, which is the product of all positive integers up to n.
p is the probability of success on a single trial.
x is the number of observed successes.

Factorials

Factorials are used in the binomial formula to count the number of ways to arrange successes and failures:

n! = 1 × 2 × 3 × ... × n
0! = 1 (by definition)
Example: 4! = 1 × 2 × 3 × 4 = 24

Applications of the Binomial Distribution

Example 6: Testing for Gender Bias in Promotions

Suppose a group of women employees claims that female employees are less likely than male employees of similar qualifications to be promoted. In a company of 1000 employees (50% female), none of the 10 employees chosen for management training were female. What is the probability of this outcome if selection is random?

This result is very unlikely (one chance in a thousand) if selection is random, suggesting possible bias.

Checking Binomial Conditions

Before using the binomial distribution, verify that its three conditions apply:

Binary data (success or failure)
Same probability of success for each trial (denoted by p)
Independent trials

Example 7: Verifying Binomial Conditions in Gender Bias Study

The data are binary (male, female).
If employees are selected randomly, the probability of selecting a female on any trial is 0.50.
With random sampling from a large population, the outcome for one trial does not affect another.

Mean and Standard Deviation of the Binomial Distribution

Formulas

For a binomial distribution with n trials and probability p of success on each trial, the mean (μ) and standard deviation (σ) are:

μ is the expected number of successes.
σ measures the variability in the number of successes.

Example 8: Checking for Racial Profiling

Context and Data

In 2006, the NYPD confronted approximately 500,000 pedestrians for suspected criminal violations. Of these, 88.9% were non-white, while the city's population was 55.4% non-white.

Let n = 500,000 (number of confrontations).
Let p = 0.554 (probability a randomly selected resident is non-white).

Calculating Mean and Standard Deviation

Using the Empirical Rule

The empirical rule states that for a bell-shaped distribution, nearly all observations fall within three standard deviations of the mean:

Lower bound:
Upper bound:

If the selection were random, we would expect the number of non-white individuals confronted to be between 275,947 and 278,053. However, the observed number was 444,500, much higher than expected, suggesting evidence of racial profiling.

Summary Table: Binomial Distribution Key Properties

Property	Description
Number of trials (n)	Fixed number of independent trials
Probability of success (p)	Same for each trial
Random variable (x)	Number of successes in n trials
Mean (μ)
Standard deviation (σ)
Probability formula

Additional info: The empirical rule is also known as the 68-95-99.7 rule, which states that approximately 68% of data falls within one standard deviation, 95% within two, and 99.7% within three standard deviations of the mean for a normal distribution.