Bayes' theorem

Also known as "Bayes' rule" or "Bayes' formula".

 

A most important theorem of Probability theory.

The discrete case

We introduce Bayes' theorem within the context of Classification. Suppose you have to assign observations to classes on the basis of the value measured on one attribute x. This attribute is first assumed to be discrete, and we will denote its values x(l).

 _____
 

 

Bayes' theorem establishes the connection between these quantities. It states that:

 

 

 

So what Bayes' theorem says is:

    "You want to know what class a new observation belongs to by measuring the value of attribute x? You'll never know for sure, but you can improve on your initial guesses P{Ck} by calculating the posterior probability P{Ckx(l)} of each class, as they now take into account the result of the measurement on x.

For this purpose, for each class:

    * Take the prior probability P{Ck} of the class. It depends only on the class label.

    * Multiply it by P{x(l) | Ck}, the class-conditional probability of the measured value x(l) of the attribute,

    * and divide the result by P{x(l)}, the unconditional probability of the the measured value of the attribute. This last number is the same for all classes, and        depends only on l".

__________________

 

The denominator P{x(l)} plays the role of a normalization factor. To see that, sum all the posterior probabilities across the classes:

as an observation certainly belongs to one class (even if you don't know which one), and one only (exhaustive and mutually exclusive events).

Replace each term in the sum by its value as given by Bayes' formula to obtain:

Now Bayes' formula can be written:

where the normalizing role of the denominator becomes clear.

The continuous case

If the attribute is continuous, Bayes' formula still holds with few changes. The definition of the class priors remains the same. But:

        * Unconditional probabilities P{x(l)} have to be replaced by the unconditional  probability density p(x), which is the probability density function of x if class labels are ignored.

        * Class-conditional probabilities P{x(l) | Ck} have to be replaced by class-conditional probability densities p(x | Ck). Each one is the p.d.f. of x within one class when all other classes are ignored.

 

and Bayes' formula now reads:

 

 

The general formulation

    Bayes' theorem is not limited to Classification problems. In fact, its most general formulation is to be found in the basis of Probability Theory, and does not explicitly refer to Classification or any other application. It states that:

    * If A is an event,

    * and {B1, B2, ..., Bn} are events that are both exhaustive and mutually exclusive,

 

then, for any i:

 

One can think of P{Bi} as our initial guess about the likelihood of the event Bi. Now if event A occurs, then we should modify our guess from P{Bi}  to

P{BiA }.

________________________________________________

 

Download this Glossary