Logistic Regression (LR)

A classic classification technique.

 

There are two equivalent ways of intoducing LR :

    1) In general terms, one approach to probabilistic classification is to identify the regression functions of class indicators. Multiple Linear Regression is poorly adapted for this job. The Linear model can be replaced, though, by a more adequate one called the Logistic Model, that provides a very powerful means of estimating class posterior probabilities.

 

    2) Discriminant Analysis relies on rather stringent assumptions about data distribution in classes. It is possible to relax these assumptions, and thus create a large family of models, Discriminant Analysis appearing then as just a particular case. Model parameter estimation is shifted from estimating means and covariance matrices of normal distributions, to maximizing a certain quantity called "Likelihood", that measures the fit of the model to sample data.

 

.
Logistic Regression is mostly used for 2-class problems (e.g.; scoring), but can be straightforwardly generalized to multi-class situations. It accomodates numerical as well as categorical predictors.

In its basic form, Logistic Regression draws piecewise linear boundaries between classes, just as DA does. As a matter of fact, if classes comply to the standard DA's assumptions, these boundaries are the same as those found by DA (which is to be expected, as these boundaries are then optimal).

A great advantage of LR over DA is that it calculates the class posterior probabilities without ever estimating the classes' individual density functions. It is therefore a direct posterior probabilities estimator, whereas DA has to go through the intermediate and risky step of class density estimation ibefore delivering class posterior probabilities.

Although theoretically more powerful, LR should not be systematically prefered to DA. Because it makes fewer assumptions on data distribution, it also requires larger samples to reach DA's good stability level. Confidence tests on coefficients are less sturdy than those of DA. Computer calculations are also longer.

Neural Networks, and more specifically the Multilayer Perceptron may be perceived as a further generalization of Logistic Regression.

___________________________________________________

 

Tutorial 1

 

In this Tutorial, we introduce Logistic Regression as a generalization of Discriminant Analysis.

The simplest form of Discriminant Analysis can be expressed in terms of the "logit link function". This result is obtained by using the stringent assumptions of DA (normal classes with identical covariance matrices), but these assumptions vanish from the final result. A natural generalization of DA is then to keep the result, and replace the stringent assumptions by looser ones for which the logit result still holds. One therefore creates a larger class of models (Logistic Regression), of which Discriminant Analysis then appears to be a particular case.

 

 

 

BASICS OF LOGISTIC REGRESSION

An intuitive approach to Logistic Regression

A toy problem

Regressing the class indicators

Calculating the exact regression function

The logistic function

From DA to Logistic Regression

The "logit" function

The logit beyond DA

Logisitic Regression is regression

Posterior probabilities

The coefficients

TUTORIAL

_________________________________________________________________

 

Tutorial 2

 

Estimating the parameters of a Logistic Regression model is quite different from what it is in Discriminant Analysis. Here, we are not estimating means and covariance matrices, but rather posterior probabilities. This is achieved directly by tuning the parameters of the model until the likelihood of the data is largest (Maximum Likelihood estimation).

This optimization problem has no closed form solution, and therefore has to be solved numerically by iterative techniques.

 

ESTIMATING THE PARAMETERS

LR is more complicated than DA

The concept of "Likelihood"

Assessing the model's output

The likelihood is a probability density

Log-likelihood

Maximizing the likelihood

Optimization

Non linear optimization

A unique solution ?

TUTORIAL

 

_____________________________________________________

 

Related readings :

Discriminant analysis