PRESS
Acronym of : PREdicted Sum of Squares.
PRESS is one of the many criteria used to evaluate the quality of a regression model. It is used for choosing between several candidate models built on the same data set, for example for the purpose of variable selection.
In Multiple Linear Regression, the traditional R² cannot be used for comparing models (except, possibly, for models incorporating the same number of independent variables) because its value necessarily increases when new variables are being added to the model (see here). The adjusted R² partially overcomes this shortcoming, but all the observations used for calculating the adjusted R² also contributed to building the model (estimation of the parameters), and it is known that this kind of approach make the estimated quality of the model systematically optimistic.
PRESS is a more "honest" measure of the goodness-of-fit of the model to the data than the Sum of Squared Residuals (used in the definition of R²). It is defined as follows :
The same thing is then done for all the other available observations, so we have n leave-one-out residuals.
PRESS is defined as the sum of these squared "residuals" :
|
PRESS = Σi r(i)² = Σi (yi - y*(i))² |
PRESS does not always decrease when new variables are added to the model : for a given sequence of variables, it first decreases, then reaches a minimum for a certain number of variables, and increases again as more variables are added to the model. So it can be used as a criterion in variable selection procedures.
More generally, if this criterion is used, the "best" model is that with the lowest value of PRESS.
PRESS usefulness extends beyond Multiple Linear Regression, and it can be used for any regression model.
The major shortcoming of PRESS is that it requires building as many models as there are observations, which leads to unmanageably long computation times. So, instead of insisting on calculating each and every leave-one-out residual, it is common practice to :
The result is an approximation of the true PRESS.
This is how Cross Validation operates when used for the purpose of estimating the prediction error of a predictive model.
Things are better in Linear Regression. It can be shown that it is then possible to build only one model (the full model based on the complet set of data), and use the following formula to calculate the individual leave-one-out residuals :

where hi is the ith diagonal element of the so-called "Hat matrix" X(X'X)-1X' (see here).
-----
This expression is valid only when the parameters of the models have been estimated by the Least Squares method. With Ridge Regression, the same expression can be used, with hi replaced by the ith diagonal element of the ridge projection matrix X(X'X - λI)-1X'. The result is then only an approximation of the true PRESS.
It can be used not only for the purpose of selecting variables, but also for estimating the optimal value of the ridge parameter λ for a given set of variables.
____________________________________________
Related readings :