Jensen's inequality
Jensen's inequality is a result in mathematics that is widely used in Probability theory and Statistics. It bears on properties of convex (resp. concave) functions and, as far as we are concerned, more particularly on transformations of random variables by convex (resp. concave) functions.
A function is said to be convex if its graph is turned "upwards", as in the illustration below.
A convex function may not have a minimum. For example, y = exp(x) is
convex, and has no minimum.
The concept of "convex function" is formalized as follows (lower image in the illustration below) :
* Let A and B be two points on the graph of f(x).
* Let S be the segment joining A to B.
* Then, any point of S is above the graph of f(x).
This is expressed mathematically by :
* For any a and b, a < b
* For any l, 0 < l < 1 (for any point of the segment) :
f(la + (1 - l)b)
l. f(a) + (1 - l).f(b)
A function is said to be "concave" if - f is convex.
Imagine a finite set of points {x1,
x2 , ..., xn} evenly spaced on
the x axis. The illustration below suggests that the convexity of f
"stretches out" the set of images of the xi
upward, and all the more so that we consider larger values of x
(or y). It can therefore be reasonably expected that the mean
of
the f(xi) be larger than the transform f(
) of
the mean of the xi.

This intuition is correct, and Jensen's inequality is just its mathematical formulation. In fact, we'll see that it is more general than what we just described and applies to :
* Any finite set of points (whether evenly spaced or not).
* Not just to the mean, but to the barycenter of the set of points fitted with any ponderation.
Jensen's inequality exists in two versions : finite and continuous.
We consider a finite set of points {x1, x2 , ..., xn}.
* This set is ponderated by an arbitrary set of positive weights li such that Sili = 1.
* Let f(x) be a convex function.
Then :
|
Sili f(xi)
|
In other words, the coordinate of the barycenter of the points (weighted by the li) is larger than that of the transform of the barycenter of these points.
If all the li are all taken to be equal to 1/n, we have :
The mean of the transforms by a convex function is larger than the transform of the mean.
If, instead of considering a finite set of points, we consider a probability density function p(x), the above result still applies under the following form :

where p(x) plays the role of the li of the finite version.
If E[X] denotes the expectation of a random variable X, we then have :
|
E[ f(X)] |
or in words :
The expectation of the transform of a r.v. X by a convex function is larger than the transform of the expectation of X.
In this site, we use Jensen's inequality to demonstrate the nonnegativeness of the Kullbak-Leibler distance.
_________________________________________________________
|
Tutorial |
We demonstrate here Jensen's inequality in the finite and the continuous case.
For the continuous case, we demonstrate only the most common case where the convex function has a derivative everywhere.
DEMONSTRATION OF JENSEN'S INEQUALTY
|
Finite case Continuous case |
||
|
TUTORIAL |
||
____________________________________________
Related readings :
|
Want to contribute to this site ? |