BackProbability Distributions, Estimation, and Hypothesis Testing: Core Concepts in Business Statistics
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Probability Distributions
Types of Probability Distribution Problems
Probability distribution problems are central to statistical inference and can be categorized based on what is given and what is to be found.
Forward Problems: Given a value of x, find the probability of interest. This typically involves using the probability mass function (PMF) for discrete variables or the probability density function (PDF) for continuous variables.
Reverse Problems: Given a probability, find the value of x that corresponds to this probability. This often requires finding percentiles or critical values using the cumulative distribution function (CDF) or its inverse.
Example: In a normal distribution with mean 100 and standard deviation 15, a forward problem might ask for the probability that x is less than 120. A reverse problem might ask for the value of x such that the probability of being below x is 0.95.
Estimation of the Mean
Sample Size Needed
Determining the appropriate sample size is crucial for reliable estimation of population parameters such as the mean.
Sample Size Formula (for estimating a mean with specified margin of error):
Where n is the required sample size, z\alpha/2 is the critical value from the standard normal distribution for the desired confidence level, \sigma is the population standard deviation, and E is the desired margin of error.
Example: To estimate a mean within 2 units with 95% confidence and \sigma = 10, use z_{0.025} = 1.96:
So, at least 97 samples are needed (always round up).
Point Estimator
A point estimator is a statistic used to estimate the value of an unknown population parameter.
The sample mean (\bar{x}) is the point estimator for the population mean (\mu).
Example: If sample values are 5, 7, and 9, then \bar{x} = (5+7+9)/3 = 7.
Confidence Interval and Its Formal Statistical Interpretation
A confidence interval provides a range of values within which the population parameter is expected to lie, with a specified level of confidence.
Confidence Interval for the Mean (when \sigma is known):
The confidence level (e.g., 95%) indicates that, in repeated sampling, the interval will contain the true parameter in 95% of samples.
Example: For \bar{x} = 50, \sigma = 8, n = 25, and 95% confidence:
So, the interval is (46.86, 53.14).
Hypothesis Testing
The Logic of Hypothesis Testing
Hypothesis testing is a formal procedure for making inferences about population parameters based on sample data.
Start with a null hypothesis (H0) representing no effect or status quo.
The alternative hypothesis (H1 or Ha) represents the claim to be tested.
Sample data are used to determine whether to reject H0.
Types of Tests: Two-Tailed and One-Tailed
The directionality of the test depends on the research question.
Two-Tailed Test: Tests for difference in either direction (e.g., H0: \mu = \mu_0 vs. Ha: \mu \neq \mu_0).
One-Tailed Test: Tests for difference in a specific direction (e.g., Ha: \mu > \mu_0 or Ha: \mu < \mu_0).
Error Analysis: Type I and Type II Errors
Two types of errors can occur in hypothesis testing:
Type I Error (\alpha): Rejecting H0 when it is true (false positive).
Type II Error (\beta): Failing to reject H0 when it is false (false negative).
Example: In a drug test, a Type I error means concluding the drug works when it does not; a Type II error means missing a real effect.
Sampling Distribution and Test Statistic Tree
The sampling distribution describes the probability distribution of a statistic (e.g., sample mean) over repeated samples from the population. The choice of test statistic depends on the parameter being tested and the information available.
For means: Use z if population standard deviation is known, t if unknown.
For proportions: Use z test.
Test Statistic for Mean (\sigma known):
Test Statistic for Mean (\sigma unknown):
The Steps in Hypothesis Testing
Hypothesis testing follows a structured process:
State the hypotheses: Null and alternative hypotheses.
Choose significance level (\alpha): Common values are 0.05 or 0.01.
Select the appropriate test statistic: Based on data type and sample size.
Determine the critical value(s) or p-value: Based on the chosen test and \alpha.
Compute the test statistic from sample data.
Make a decision: Reject or fail to reject H0 based on comparison of test statistic and critical value or p-value and \alpha.
Reporting Test Results: p-values
The p-value is the probability, under the null hypothesis, of obtaining a result as extreme or more extreme than the observed result.
If p-value < \alpha, reject the null hypothesis.
If p-value > \alpha, fail to reject the null hypothesis.
Example: If the p-value is 0.03 and \alpha = 0.05, the result is statistically significant.
Additional info: The above notes expand on the brief points in the original material, providing definitions, formulas, and examples for clarity and completeness.