PRESS

Acronym of : PREdicted Sum of Squares.

# What is PRESS ?

PRESS is one of the many criteria used to evaluate the quality of a regression model. It is used for choosing between several candidate models built on the same data set, for example for the purpose of variable selection.

In Multiple Linear Regression, the traditional R² cannot be used for comparing models (except, possibly, for models incorporating the same number of independent variables) because its value necessarily increases when new variables are being added to the model (see here). The adjusted R² partially overcomes this shortcoming, but all the observations used for calculating the adjusted R² also contributed to building the model (estimation of the parameters), and it is known that this kind of approach make the estimated quality of the model systematically optimistic.

# Definition of PRESS

PRESS is a more "honest" measure of the goodness-of-fit of the model to the data than the Sum of Squared Residuals (used in the definition of R²). It is defined as follows :

• Observation #1 is removed from the data set, and a model is built, based of the n - 1 observations left.
• The predicted value for observation #1 is then calculated, which is denoted y*(1). Indices in parenthesis traditionally mean that the corresponding observation has not taken any part in building of the model.
• Let r(1) = y1 - y*(1) be the leave-one-out residual for observation #1. It is the error of the model in its attempt to predict y1.

The same thing is then done for all the other available observations, so we have n leave-one-out residuals.

PRESS is defined as the sum of these squared "residuals" :

 PRESS = Σi r(i)² =  Σi (yi - y*(i))²

PRESS does not always decrease when new variables are added to the model : for a given sequence of variables, it first decreases, then reaches a minimum for a certain number of variables, and increases again as more variables are added to the model. So it can be used as a criterion in variable selection procedures.

More generally, if this criterion is used, the "best" model is that with the lowest value of PRESS.

# Calculating PRESS

PRESS usefulness extends beyond Multiple Linear Regression, and it can be used for any regression model.

## PRESS and Cross Validation

The major shortcoming of PRESS is that it requires building as many models as there are observations, which leads to unmanageably long computation times. So, instead of insisting on calculating each and every leave-one-out residual, it is common practice to :

• Partition the data set into K groups of observations drawn randomly without replacement, with K much smaller than the number of observations.
• Build K models, each model leaving aside one of these groups of observations.
• For each model, calculate the sum of the squared prediction errors on the observations that were left aside.

The result is an approximation of the true PRESS.

This is how Cross Validation operates when used for the purpose of estimating the prediction error of a predictive model.

## PRESS in Linear Regression

Things are better in Linear Regression. It can be shown  that it is then possible to build only one model (the full model based on the complet set of data), and use the following formula to calculate the individual leave-one-out residuals :

where hi is the ith diagonal element of the so-called "Hat matrixX(X'X)-1X' (see here).

-----

This expression is valid only when the parameters of the models have been estimated by the Least Squares method. With Ridge Regression, the same expression can be used, with hi replaced by the ith diagonal element of the ridge projection matrix X(X'X - λI)-1X'. The result is then only an approximation of the true PRESS.

It can be used not only for the purpose of selecting variables, but also for estimating the optimal value of the ridge parameter λ for a given set of variables.

____________________________________________

 Multiple Linear Regression