BackComprehensive Review of Statistics and Regression Methods
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Review on Statistics
Summation Operator and Properties
The summation operator is a fundamental notation in statistics, used to denote the sum of a sequence of numbers. Several properties simplify calculations involving sums.
Summation Operator:
Property 1 (Constant):
Property 2 (Constant Multiple):
Property 3 (Linearity):
Sample Statistics
Sample statistics are used to summarize and describe data from a sample.
Sample Average (Mean):
Sum of Deviations:
Sum of Squared Deviations:
Sum of Multiplied Deviations:
Sample Variance and Covariance
Sample Variance:
Sample Covariance:
Statistics and Econometrics
Statistics: The study of methods to draw useful information from data, including descriptive statistics (summarizing data) and inferential statistics (estimators, tests, confidence intervals).
Econometrics: Application of statistical methods to economic data to extract meaningful information. Microeconometrics focuses on individual or firm-level data, while macroeconometrics deals with aggregate data.
Random Experiments and Random Variables
Random Experiment: An experiment whose outcome cannot be predicted with certainty, but all possible outcomes can be described. The experiment can be repeated under the same conditions (e.g., tossing a coin).
Random Variable (RV): A variable whose value is determined by a random experiment. It is a function mapping outcomes to real numbers.
Discrete RV: Takes at most countably infinite values.
Continuous RV: Takes values in an uncountably infinite set.
Probability Distributions
Probability Distribution: Assigns probabilities to each possible value of a random variable.
For discrete RVs: List of probabilities for each value.
For continuous RVs: Probability density function (PDF); probability at a single point is zero.
Discrete RV | Continuous RV | |
|---|---|---|
Range | Countable | Uncountable |
Description | pmf | |
Probability at a point | Has mass | No mass |
Expectation |
Population Mean and Variance
Population Mean:
Laws of Expectation:
Population Variance:
Shortcut:
Laws of Variance:
Covariance and Correlation
Population Covariance:
Shortcut:
Laws of Covariance:
If and are independent, (converse not always true)
If or , then
Population Coefficient of Correlation:
Laws Regarding the Sum of Two Random Variables
Conditional Expectation
The conditional expectation of given is the expected value of in the subpopulation where .
Notation:
Conditional Probability:
Laws of Conditional Expectation
For any function :
If and are independent:
Law of Iterated Expectation:
More generally:
If , then ; any function of is uncorrelated with .
Review on Linear Regression
Introduction to Regression
Regression analysis studies the conditional mean function of a response variable given explanatory variables. It is widely used in economics to investigate causal relationships and to focus on the mean response of the dependent variable.
Regression: Analysis of , the expected value of given .
Application: Investigate how changes in affect , holding other variables constant.
Linear Multiple Regression Model
Model: for
Assumptions:
Random sampling: are i.i.d.
Conditional mean:
Nonzero finite fourth moments (no large outliers)
No perfect multicollinearity (no exact linear relationship among regressors)
Partial Effect: measures the effect of on , holding other variables constant.
Homoskedasticity: If does not depend on , errors are homoskedastic; otherwise, heteroskedastic.
Ordinary Least Squares (OLS) Estimator
OLS Estimator:
The OLS estimator finds the linear combination of regressors that minimizes the sum of squared residuals.
Measures of Fit
Standard Error of the Regression (SER): Measures the spread of around the regression line.
R-squared (): Fraction of variation in explained by regressors.
Adjusted : Adjusts for the number of regressors.
Can be negative; penalizes adding unnecessary regressors.
Large Sample Distribution of OLS Estimator
Under standard assumptions, as increases, the OLS estimators are approximately jointly normally distributed.
Each is approximately .
Hypothesis Testing in Regression
Testing a Single Coefficient: vs.
t-statistic:
Standard Error:
p-value: , where is the normal CDF
Reject if or (for 5% significance level)
Confidence Intervals and Joint Hypotheses
95% Confidence Interval for :
Joint Hypothesis: Use robust F-tests for multiple coefficients.
Consistency and Asymptotic Normality
Consistency: An estimator is consistent if it converges in probability to the true parameter value as the sample size increases.
Asymptotic Normality: As sample size grows, the distribution of the estimator approaches a normal distribution, allowing for inference using normal-based confidence intervals and hypothesis tests.