Ridge regression

Multiple Linear Regression and collinearity

Multiple Linear Regression is very sensitive to independent variables being in a configuration of near-collinearity : when this happens, the model parameters become unstable (large variances) and can therefore no longer be interpreted. From a mathematical standpoint, near-collinearity makes the X'X matrix  ill-conditioned (with X the data matrix) : the value of its determinant is nearly 0, and attempts to calculate the inverse of the matrix run into numerical snags with uncertain final values.

Exact collinearity occurs when at least one of the independent variables is a linear combination of other independent variables. X is not a full rank matrix anymore, the determinant of X is exactly 0, and inverting X'X is not just difficult, it is downright impossible because the inverse matrix simply does not exist.

-----

The same happens when there are fewer observations than there are parameters to be estimated, a not uncommon situation. For example, a spectrum may be described by the list of light intensity measured at a few hundred different wavelengths (the independent variables), whereas only a few tens of spectra (the observations) describing some phenomenon are available.

-----

For the analyst, quasi-collinearity of some predictors causes a large variance (uncertainty) of the model predictions, therefore making these predictions highly unreliable.

Ridge Regression

Ridge Regression is a variant of ordinary Multiple Linear Regression whose goal is to circumvent the problem of independent variables collinearity. It gives-up the Least Squares (LS) as a method for estimating the parameters of the model, and focusses instead of the X'X matrix. This matrix will be artificially modified so as to make its determinant appreciably different from 0.

By doing so, it makes the new model parameters somewhat biased (whereas the parameters as calculated by the LS method are unbiased estimators of the true parameters). But the variances of these new parameters are smaller than that of the LS parameters and in fact, so much smaller than their Mean Square Errors (MSE) may also be smaller than that of the parameters of the LS model. This is an illustration of the fact that a biased estimator may outperform an unbiased estimator provided its variance is small enough.

Moreover, the predictions errors of the Ridge Model will also turn out to be more accurate than that of the LS regression model when independent variables exhibit near collinearity. Therefore, the idea behind of Ridge Regression is at the heart of the "bias-variance tradeoff" issue.

Ridge parameter

These improvements do not come free.

 

Yet, Ridge Regression is more than a "last resort" attempt to salvage LS linear regression in case of near or full collinearity of the independent variables. It is to be considered as a major linear regression technique of its own that proves its usefulness when collinearity is a problem, an all-too-common circumstance.

_________________________________________________________________

 

 

Tutorial 1

 

We first go over the problem of collinearity (or "multicollinearity") of the independent variables, which is the curse of Multiple Linear Regression.

We then show how a simple but effective change in the method used for calculating the parameters can circumvent this problem.

We further study the statistical properties of the parameters of the Ridge Regression model, and discover that these parameters outperform the usual Least Squares parameters in a situation of near-collinearity of the independent variables.

We then address the difficult problem of choosing an optimal value for the "ridge parameter".

 

 

RIDGE REGRESSION

 Collinearity

Interpretation of the values of the parameters

Geometric interpretation

Analytic interpretation

Ridge Regression

Standardization of the variables

Three equivalent definitions of Ridge Regression

Reconditioning the X'X matrix

Penalizing the Sum of Squared Residuals (SSR)

Constraint on the length of the vector of parameters

Analytic solution

Geometric interpretation

Statistical properties of the ridge estimator

Relation with the Least Squares estimator

Bias

Variance

Mean Square Error (MSE)

Choosing the value of the Ridge parameter

Analytical solutions

Hoerl's solution

Ridge variant of Mallow's Cp

Graphical solutions

Validation

TUTORIAL

_________________________________________________________________

 

 

Tutorial 2

 

There is an unexpected and quite illuminating link between Ridge Regression and Principal Components Analysis. When the Principal Components (PC) of the data set are used as independent variables instead or the original variables, the Ridge Regression model appears to result from a simple modification of the LS model built in the same basis : every parameter of the LS regression is then "shrunk" by a factor that is small for the first PCs and larger for the last PCs. So Ridge Regression allows large variance Principal Components to have a larger influence on the final model than low variance Principal Components.

We finally introduce the important concept of "effective number of parameters" that is a more realistic measure of the true "flexibility" of the model than just the number of parameters.

 

 

RIDGE REGRESSION AND PRINCIPAL COMPONENTS ANALYSIS

Singular Values Decomposition (SVD)

Ridge Regression in the singular form

Least Squares model

Ridge model : shrinkage

Ridge Regression and Principal Components

MSE of the parameters

Effective number of parameters (or degrees of freedom)

TUTORIAL

 

_________________________________________________________________

 

 

 

Related readings

Multiple Linear Regression

Principal Components Analysis

Bias-variance tradeoff

 

Download this Glossary

 

Want to contribute to this site ?