OLS assumptions

For an Ordinary Least Squared (OLS) estimation to work properly, a number of assumptions have to be fulfilled. If this is the case, the estimated parameters are the “best linear unbiased estimator”.

1. Linearity in parameters

The parameters $ß$ of the model must be linear (i.e. not not squared or logarithmic, etc.). The values of $x$ themselves don’t have to be linear, but the parameter as such has to.

For example: ✅ $y_{i} = β - 0 + β_{1} x_{i} + β_{2} x_{i}^{2}$ (only x is exponentiated) ❌ $y_{i} = β - 0 + β_{1} x_{i} + β_{2}^{2} x_{i}$ (the parameter is exponentiated, making it non-linear

2. Random Sample

Observations have to be drawn randomly from the population.

3. No (multi)collinearity

The independent variables cannot be highly correlated with each other. This can occur if two or more variables measure a related concept or stand in relation to each other. This leads either to wrong estimates. In case of perfect collinearity the model can’t be estimated at all.

4. Zero conditional mean of errors

There is no systematic relationship between the errors and the regressors.

5. Homoskedasticity

Assumption that the residuals (error terms) vary by the same amount over the whole range of the independent variable. If this is not the case there is hetereoskedasticity where the estimates are more precise at some points than others. Consequently the standard errors are wrongly estimated which leads to wrong results. Hetereoskedasticity can be a sign for missing variables.

6. No Autocorrelation

Errors are independent across observations. This is usually not given in panel data with several time points per observation.

7. Normality (for small samples)

Errors follow the normal distribution. This only required for smaller samples, in larger ones the Central Limit Theorem relaxes this requirement.

Cédric's notes

Explorer