|
Interactive animation |
Fisher's linear discriminant
This is a slight misnomer, as Fisher's linear discriminant is not a classifier per se, but rather a dimensionality reduction technique. Yet, a genuine classifier can easily be derived from it.
Dimensionality reduction is one of the central issues of statistical Data Modeling. Recall that the problem is to describe the data set with as few variables as possible, and yet discard as little information as possible in doing so.
* The "brute force" approach to dimensionality reduction is variable selection : of all the variables in the data table, only a handful are retained, that are hoped to capture most of the data structure. This approach is most common in Multiple Linear Regression.
* A more subtle approach is
depicted in the following illustration. Suppose we want to build a classifier
for the purpose of discriminating between two classes C1 and
C2. Of the two variables x1 and x2 describing
the data, which one should we retain in the final 1-variable-only model
(top image)
?
Suppose we decide to retain x1. This means that all the data points are projected on the x1 axis, or, equivalently, that their x2 coordinate is ignored. The classifier is just :
Clearly this classifier is very poor because of the considerable overlap of the projections of C1 and C2 on x1. Any threshold T0 will be conducive to a high misclassification rate.
Unfortunately, retaining x2 and
discarding x1 is
no better for the same reason.
So we are in a situation where any variable selection is worse than no variable selection at all, and dimensionality reduction seems impossible because too much information is lost in the process of eliminating any of the two original variables.
Yet, it is easy to build a perfect classifier (lower image in the illustration above) : just project both classes on the blue line F. The two projected classes are then completely separated, and it's easy to find a threshold T0 such that the above assignment rule gives a "0" misclassification rate.
Is this classifier a 1-variable or a 2-variable classifier ?
* Both x1 and x2 are involved in defining F, so it seems that we have a 2-variable classifier.
* But if we denote by x the abscissa of the projection of a data point on F (with respect to an arbitrary origin), then considering the value of x alone is enough to have a perfect assignement rule. So we in fact have a 1-variable-only classifier, and this variable is x.
So we have found a data reduction technique more complex than variable selection, but also obviously more powerful. Fisher's linear discriminant is just a formalization of the above example.
Fisher's linear discriminant deals with :
* Two classes only C1 and C2,
* Described by an arbitrary large number of original variables x1 , x2 , ..., xp,
* The goal being to identify the 1-Dimensional straight line F on which the class projections will be "as separated as possible".
Ideally, we would therefore like to define some measure of overlap between two sets of points, and find the projection direction that makes this measure minimal. Unfortunately, there is no simple and elegant measure of overlap between two sets of points, and we will have to be satisfied with defining a criterion that :
* gets larger with the distance between the projected class barycenters,
* gets larger when the projected classes become more "compact" according to some measure,
hoping that if the class projections are compact and the projected barycenters are far from each other, then the class projections will exhibit little overlap.
In general, "distance between projected barycenters" and "class projections compacity" cannot be maximized simultaneously. Fisher's criterion J is a trade-off between these two incompatible objectives.
Fisher's linear discriminant F is the direction of space that makes Fisher's criterion maximum. It is generally unique.
Once it is found, it is then the practioner's responsibility to establish the value T0 of the threshold T that will minimize the misclassification rate, and this turns Fisher's discriminant into a complete classifier.
Fisher's linear discriminant criterion is defined in a somewhat heuristic way, and therefore cannot claim to always define the best 1-D projection space in terms of minimizing the eventual misclassification rate. As a matter of fact, other criteria of "optimal class projections" may be defined and are occasionally used instead of Fisher's criterion.
Yet, in the special case where the two classes are multinormal with identical Covariance Matrices, it can be shown that Fisher's discriminant is indeed optimal. The classifier derived from it is then identical to the discriminant function obtained by the least-squares errors approach. The classification threshold is then also provided by theory, and does not have to be adjusted "by hand".
Fisher's linear discriminant is the first step on the way to Discriminant Factor Analysis.
Piecewise-linear (or quadratic) boundaries may then be defined between
the classes, thus partitioning the data space into decision regions,
each region being associated to one class.
_________________________________________
|
Tutorial 1 |
These questions are addressed in detail in the following Tutorial.
We first explain the reasoning behind the definition of Fisher's criterion, of which we give two forms :
We then show how Fisher's discriminant can be determined from Fisher's criterion.
FISHER'S CRITERION AND FISHER'S DISCRIMINANT
|
Fisher's criterion J Distance between projected barycenters is not good enough Variance of the projection of a cloud, covariance matrix Class projections spread and "Within-class" covariance matrix Alternate expression of Fisher's criterion, "Between-class" matrix Fisher's criterion Fisher's vector Derivative of a quadratic form (result only) Maximizing Fisher's criterion, Fisher's vector Is Fisher's discriminant optimal ? Classification and Fisher's discriminant Class separability |
||
|
TUTORIAL |
||
|
|
|
|
* Classes can be dragged around. |
|
___________________________________________________________________
|
Tutorial 2 |
We then give some mathematical complements that are
needed for maximizing Fisher's criterion when expressed in the second form (needed
for more than two classes).
MAXIMIZING THE GENERAL FORM OF FISHER' CRITERION
|
Derivative of a quadratic form (demonstration) Finding the maximum of the ratio of two quadratic forms The general case Application to Fisher's criterion |
||
|
TUTORIAL |
||
______________________________________
Related readings:
|
Want to contribute to this site ? |