Ridge regression
Multiple Linear Regression is very sensitive to independent variables being in a configuration of near-collinearity : when this happens, the model parameters become unstable (large variances) and can therefore no longer be interpreted. From a mathematical standpoint, near-collinearity makes the X'X matrix ill-conditioned (with X the data matrix) : the value of its determinant is nearly 0, and attempts to calculate the inverse of the matrix run into numerical snags with uncertain final values.
Exact collinearity occurs when at least one of the independent variables is a linear combination of other independent variables. X is not a full rank matrix anymore, the determinant of X is exactly 0, and inverting X'X is not just difficult, it is downright impossible because the inverse matrix simply does not exist.
-----
The same happens when there are fewer observations than there are parameters to be estimated, a not uncommon situation. For example, a spectrum may be described by the list of light intensity measured at a few hundred different wavelengths (the independent variables), whereas only a few tens of spectra (the observations) describing some phenomenon are available.
-----
For the analyst, quasi-collinearity of some predictors causes a large variance (uncertainty) of the model predictions, therefore making these predictions highly unreliable.
By doing so, it makes the new model parameters somewhat biased (whereas the parameters as calculated by the LS method are unbiased estimators of the true parameters). But the variances of these new parameters are smaller than that of the LS parameters and in fact, so much smaller than their Mean Square Errors (MSE) may also be smaller than that of the parameters of the LS model. This is an illustration of the fact that a biased estimator may outperform an unbiased estimator provided its variance is small enough.
Moreover, the predictions errors of the Ridge Model will also turn out to be more accurate than that of the LS regression model when independent variables exhibit near collinearity. Therefore, the idea behind of Ridge Regression is at the heart of the "bias-variance tradeoff" issue.
These improvements do not come free.
Yet, Ridge Regression is more than a "last resort" attempt to salvage LS linear regression in case of near or full collinearity of the independent variables. It is to be considered as a major linear regression technique of its own that proves its usefulness when collinearity is a problem, an all-too-common circumstance.
_________________________________________________________________
|
Tutorial 1 |
We first go over the problem of collinearity (or "multicollinearity") of the independent variables, which is the curse of Multiple Linear Regression.
We then show how a simple but effective change in the method used for calculating the parameters can circumvent this problem.
We further study the statistical properties of the parameters of the Ridge Regression model, and discover that these parameters outperform the usual Least Squares parameters in a situation of near-collinearity of the independent variables.
We then address the difficult problem of choosing an optimal value for the "ridge parameter".
RIDGE REGRESSION
|
Collinearity Interpretation of the values of the parameters Geometric interpretation Analytic interpretation Ridge Regression Standardization of the variables Three equivalent definitions of Ridge Regression Reconditioning the X'X matrix Penalizing the Sum of Squared Residuals (SSR) Constraint on the length of the vector of parameters Analytic solution Geometric interpretation Statistical properties of the ridge estimator Relation with the Least Squares estimator Bias Variance Mean Square Error (MSE) Choosing the value of the Ridge parameter Analytical solutions Hoerl's solution Ridge variant of Mallow's Cp Graphical solutions Validation |
||
|
TUTORIAL |
||
_________________________________________________________________
|
Tutorial 2 |
There is an unexpected and quite illuminating link between Ridge Regression and Principal Components Analysis. When the Principal Components (PC) of the data set are used as independent variables instead or the original variables, the Ridge Regression model appears to result from a simple modification of the LS model built in the same basis : every parameter of the LS regression is then "shrunk" by a factor that is small for the first PCs and larger for the last PCs. So Ridge Regression allows large variance Principal Components to have a larger influence on the final model than low variance Principal Components.
We finally introduce the important concept of "effective number of parameters" that is a more realistic measure of the true "flexibility" of the model than just the number of parameters.
RIDGE REGRESSION AND PRINCIPAL COMPONENTS ANALYSIS
|
Singular Values Decomposition (SVD) Ridge Regression in the singular form Least Squares model Ridge model : shrinkage Ridge Regression and Principal Components MSE of the parameters Effective number of parameters (or degrees of freedom) |
||
|
TUTORIAL |
||
_________________________________________________________________
|
|
Related readings
|
Want to contribute to this site ? |