Regularization

We suggest that you first read the entry on the bias-variance tradeoff.

-----

When a model contains too many parameters, its bias is low (which is unfortunately useless), but its variance is high. In particular :

The consequence of all this is that the model has a poor generalization capacity.

 

The first action against these instabilities is to rigorously select the independent variables that will enter the model, which mechanically reduces the number of parameters (for example, see here).

But this may prove insufficient :

 

The analyst may then resort to regularization techniques. Overparametrization translates into large variances of the parameters, and therefore a propensity for these parameters to take large absolute values. The idea if regularization is then to place constraints on the parameters of the model, so that they find it difficult (or impossible) to take large values.

Two typical regularization technniques are Ridge Regression, and its neural equivalent, weight decay.

-----

Regularization has annoying side effects :

-----

This figure illustrates the effect of regularization.

The upper image displays a polynomial regression model (blue line) whose degree is too high. The model oscillates strongly in its effort to minimize its Mean Square Error. Clearly, it does not understand the general trend of the data. Its shape (and therefore its predictions) would change considerably as a consequence of a small change in the data (see here).

 

 

 

The lower image displays the same model, but after regularization. Although its degree is the same as that of the previous polynom, regularization is now constraining the coefficients to take only moderately large values. The model fits the general trend of the data better, it is more stable, et its predictions are more accurate.


In this particular case, a similar result would have been obtained by reducing the degree of the polynom.

Note that the "oscillations" metaphor is a bit misleading. Take Multiple Linear Regression (MLR) : the model is a hyperplane that, of course, never oscillates. Yet, MLR is prone, like any model, to overparametrization, and in can be regularized in more than one way (Ridge Regression, Lasso Regression, Principal Components Regression, PLS regression).

____________________________________________________________

Related readings

Bias-variance tradeoff

Ridge Regression

Download this Glossary

 

Want to contribute to this site ?