Jensen's inequality

Jensen's inequality is a result in mathematics that is widely used in Probability theory and Statistics. It bears on properties of convex (resp. concave) functions and, as far as we are concerned, more particularly on transformations of random variables by convex (resp. concave) functions.

Convex functions

A function is said to be convex if its graph is turned "upwards", as in the illustration below.


A convex function may not have a minimum. For example,  y = exp(x) is convex, and has no minimum.

The concept of "convex function" is formalized as follows (lower image in the illustration below) :

    * Let A and B be two points on the graph of f(x).

    * Let S be the segment joining A to B.

    * Then, any point of S is above the graph of f(x).

 

This is expressed mathematically by :

    * For any a and b, a < b  

    * For any l,  0 < l < 1  (for any point of the segment) :

f(la + (1 - l)b) lf(a) + (1 - l).f(b)

 


A function is said to be "concave" if - f is convex.

Transformation by a convex function

Imagine a finite set of points {x1, x2 , ..., xn} evenly spaced on the x axis. The illustration below suggests that the convexity of f  "stretches out" the set of images of the xi upward, and all the more so that we consider larger values of x (or y). It can therefore be reasonably expected that the mean  of the f(xi) be larger than the transform  f() of the mean of the xi.

 

 

This intuition is correct, and Jensen's inequality is just its mathematical formulation. In fact, we'll see that it is more general than what we just described and applies to :

    * Any finite set of points (whether evenly spaced or not).

    * Not just to the mean, but to the barycenter of the set of points fitted with any ponderation.

The two versions's of Jensen's inequality

Jensen's inequality exists in two versions : finite and continuous.

The finite version of Jensen's inequality

We consider a finite set of points {x1, x2 , ..., xn}.

    * This set is ponderated by an arbitrary set of positive weights li such that Sili = 1.

    * Let f(x) be a convex function.

Then :

 

Sili f(xi)    f [Sili xi]

 

In other words, the coordinate of the barycenter of the points (weighted by the li) is larger than that of the transform of the barycenter of these points.

If all the li are all taken to be equal to 1/n, we have :

The mean of the transforms by a convex function is larger than the transform of the mean.

The continuous version of Jensen's inequality

If, instead of considering a finite set of points, we consider a probability density function p(x), the above result still applies under the following form :

where p(x) plays the role of the li of the finite version.

If E[X] denotes the expectation of a random variable X, we then have :

 

E[ f(X)]    f(E[X])

 

or in words :

The expectation of the transform of a r.v. X by a convex function is larger than the transform of the expectation of X.

Use of Jensen's inequality

In this site, we use Jensen's inequality to demonstrate the nonnegativeness of the Kullbak-Leibler distance.

_________________________________________________________

 

Tutorial

 

We demonstrate here Jensen's inequality in the finite and the continuous case.

For the continuous case, we demonstrate only the most common case where the convex function has a derivative everywhere.

 

DEMONSTRATION OF JENSEN'S INEQUALTY

Finite case

Continuous case

TUTORIAL

 ____________________________________________

Related readings :

Kullbak-Leibler's distance

 

 

 

 

Download this Glossary

 

 

Want to contribute to this site ?