Cost  (of misclassification)

A good classifier estimates the probabilities for individuals to belong to the various classes. When it is used, later, for dispatching new individuals to classes, it is only natural to do so on the basis of assigning an individual to the class with the highest probability. As a matter of fact, it can be shown that this is strategy is the one with the lowest misclassification rate.

Now, will this strategy also guarantee the highest level of user's satisfaction ? Think of the case of medical diagnostics. Imagine a classifier that has been trained to sort out lung X-rays into "Healthy" and "Cancer". Is this classifier expected to have the lowest possible misclassification rate ? Certainly not, for in doing so, one would consider equally serious mistakes to :

    *  Assign a truly "Healthy" picture to the "Cancer" class,

    * Or assign a genuine "Cancer" picture to the "Healthy" class,

 

whereas in fact the first type of mistake is just annoying, but the second type of mistake is disastrous. Clearly, one would rather "twist", or bias, the classifier towards making fewer errors of the second type, even if that means making more mistakes of the first type.

 

In the business world, decisions (of the classification type) involve both costs and expected profits. The classifier is then expected to help making the  decisions that will maximize expected profits. For example, mailing targeting involves two types of errors :

    * Sending a catalogue to a customer who will not buy,

    * and not sending a catalogue to a customer who would have bought.

 

Now, mere misclassification rate is simply not good enough a criterion to decide on whom to send a catalogue to. Incurred costs and missed profits have then to be taken into account to optimize the mailing list.

 

The same kind of questions, only more complex, exists when there are more than two classes.

 

There are techniques to be used in order to "distort" a classifier normal behavior so that :

    * Outputs are no longer probabilities to belong to the various classes,

    * Dispatching individuals to classes on the basis of the "highest value output" will maximize the expected profit.

____________________________________________________________

 

Related readings

Classification

Download this Glossary

 

Want to contribute to this site ?