|
Tutorials |
Cramér-Rao lower bound
The Cramér-Rao lower bound is a limit to the variance that can be attained by an unbiased estimator of a parameter q of a distribution.
Let q be a parameter of a distribution. Recall that an estimator q* of q is said to be "unbiased" if its expectation is equal to q for all values of q:
E[q*] = q
What is really expected from an estimator is to have a low Mean Squared Error (MSE), and this does not demand that the estimator be unbiased. But unbiased estimators enjoy a great popularity because they are easier to study than estimators that might have a lower MSE, but that are biased. In particular, the MSE of an unbiased estimator is just its variance.
A parameter may have several (or even an infinity) of unbiased estimators. For example, the sample mean and the sample median are both unbiased estimators of the normal distribution mean, and it is known that the variance of the sample mean is then lower than the variance of the sample median. So, in this particular case, the sample mean is to be prefered to the sample median as an estimator of the distribution mean.
One may therefore ask :
Given a parameter q of a distribution, what is the smallest variance that can be attained by an unbiased estimator of q ?
The answer is given, in part, by the Cramér-Rao lower bound that we now describe.
Let p(x, q) be a probability distribution (either discrete or continuous) that depends on the parameter q. Let's draw a n-sample X = (x1, x2, ..., xn) from p(x, q). The Likelihood L of the sample is :
L = Pi p(xi, q)
As usual, we'll rather consider the log-likelihood LL :
LL = log(L) = Si log( p(xi, q))
LL depends on q, and as a matter of fact, the basic idea of Maximum Likelihood estimation is to identify the value of q that will make the sample (log-) likelihood largest.
If p(x, q) is regular enough, then LL is a differentiable function and its derivative is called the score of the sample. We'll denote it s(X, q).
So, by definition :

Maximum Likelihood estimation therefore tries to identify the value of q that makes the score of the sample equal to 0.
The value of the score of a sample is a measure of the sensitivity of the sample log-likelihood to small changes of the value of q. If the value of the score is small for a given value of q, the likelihood of the sample (that is, its probability density) will be essentially unaffected by small changes of q :
* Some of the observations in the sample will see their contributions to the log-likelihood increase,
* While the other observations will see their contributions to the log-likelihood decrease,
with an overall variation of the log-likelihood just about 0.
So observations can't agree among themselves as to how q should be changed to increase the likelihood of the sample.
One can therefore fairly conclude that the sample has little to say about the value of q.
Although this line of thinking is reminiscent of that leading
to Maximum Likelihood estimation, we are not within the context of Maximum
Likelihood estimation. Our goal is not to maximize the likelihood.
For a given value of q, the score depends on the sample and is therefore a random variable. What are the properties of this r.v. ?
We'll show that, under some regularity conditions, the score is centered (expectation is 0) :
E[s(X, q)] = 0
What about its variance Var(s(X, q)) ?
Suppose that this variance is very small for a given value of q. This means that almost all the samples have a score value near 0 (the expectation of the score), and therefore that almost all the samples contain little information about the true value of q (see above).
Now consider q*, any unbiased estimator of q. If the variance of the score is very small, and therefore almost all the samples contain only little information about the true value of q, one certainly cannot expect q* to do a good job (i.e. have a low variance) at estimating this value : we should expect the variance of q* to be large.
One may therefore anticipate a negative relation between :
* The variance of the score, and
* The variance of any unbiased estimator of q.
This intuition is justified. We'll show that under certain regularity conditions, any unbiased estimator q* of q is such that :
|
|
This relation establishes a lower bound to the variance that can be attained by an unbiased estimator of q.
Although quite simple, the above expression cannot be used as is because Var(s(X, q)) is unknown. We therefore have to calculate Var(s(X, q)) as a function of what is known, that is p(x, q). This calculation is detailed in the Tutorial below. It will lead us to the first operational form of the Cramér-Rao inequality :
|
|
The denominator is called the (Fisher's) information In(q) of the distribution for n-samples. The reason for this name is clear : the larger the "information" (i.e. about the value of the parameter), the more accurate will be the predictions of an unbiased estimator attaining the Cramér-Rao lower bound (see below).
It is therefore common to see the Cramér-Rao inequality written as :

Under certain regularity conditions that we'll detail, the Cramér-Rao inequality may also be given another operational form :
|
|
which is sometimes easier to use than the first form, as we'll see in the Tutorial below.
The Cramér-Rao lower bound is a limit below which the variance of an unbiased estimator cannot be found (under the proper regularity conditions), but nothing is said so far about whether or not this bound can be attained : there is no clue as to the existence of an unbiased estimator of q whose variance is equal to the Cramér-Rao lower bound. As a matter of fact, one can exhibit examples where, although the regularity conditions are satisfied, the Cramér-Rao lower bound is equal to 0, and therefore clearly inaccessible.
We'll review two necessary and sufficient conditions to the existence of an efficient estimator.
We'll show that a necessary and sufficient condition for the existence of an unbiased estimator of q whose variance is equal to the Cramér-Rao lower bound is that the score function can be written as :
s(X, q) = a(q).[h(X) - q]
where :
* a(q) is a function of q only, and
* h(X) is a function that depends on the sample only, and not on q (a statistic).
Then (and only then) is q* = h(X) an unbiased estimator attaining the Cramér-Rao lower bound. Such an estimator is called efficient.
We'll show that the variance of q* is then equal to 1/a(q).
-----
But the issue of the existence of an efficient estimator must be viewed in a broader context. The "good" question is : "Does there exist a function g such that there exists an efficient estimator of g(q) ?".
A simple modification of the demonstration of the above result shows that a necessary and sufficient condition for this double existence is that the score can be written as :
|
s(X, q) = a(q).[h(X) - g(q)] |
h(X) is then an efficient estimator of g(q), whose variance is equal to g'(q)/a(q).
-----
From this result, we'll deduce that there is at most one function g(q) that can be efficiently estimated. Consequently, if there exists a function g(q) verifying the above condition, and if this function is not the identity function, then q has no efficient estimator.
There is another characterization of the existence of an efficient estimator. We'll show that, with some reservations, there exists a function g(q) that can be efficiently estimated if and only if the distribution p(x, q) can be written as :
|
p(x, q) = exp[A(x)B(q) + C(x) + D(q)] |
and therefore belongs to the exponential family.
We'll then identify g(q) as well as its efficient estimator.
* An efficient estimator is clearly a Minimum Variance Unbiased Estimator (MVUE).
* But if the above condition is not fulfilled, the variance of a MVUE will be larger than the Cramér-Rao lower bound.
* Conversely, if the regularity conditions needed to derive the Cramér-Rao lower bound are not fulfilled, the quantities in the Cramer-Rao inequality may sometimes still be calculated, but there may then exist unbiased estimators whose variances are smaller than the (then meaningless) Cramér-Rao "lower bound". We'll see that such is the case with the uniform distribution.
Let p(x, q) be a probability distribution.
We'll show that :
|
If there exists a function g(q) admitting an efficient estimator, then this estimator is a sufficient statistic for q. |
So it appears that the concept of "sufficient statistic" is somewhat weaker than that of "efficient estimator" :
* If q* is an efficient estimator of a function g(q), then q* is a sufficient statistic for q.
* But it may be that q has a sufficient statistic and yet no function g(q) can be efficiently estimated.
If q* is an efficient estimator of q, then it is the only efficient estimator of q : the variance of any other unbiased estimator of q will be larger than the variance of q* :
|
If q has an efficient estimator q*, then q* is the only efficient estimator of q. |
This property is not specific to efficient estimators. It is true for any Minimum Variance Unbiased Estimator, and is demonstrated here.
There is close relationship between :
* Efficient etimators, and
* Maximum Likelihood estimators.
a) We'll show that if an unbiased estimator q* of q attains the Cramér-Rao lower bound (efficient estimator), then it is equal to the (then unique) Maximum Likelihood estimator q*ML of q.
|
q*
efficient |
The converse is not true : a Maximum Likelihood estimator is not necessarily efficient. But we state now that it is nevertheless asymptotically efficient.
b) One of the most important properties of Maximum Likelihood estimators (MLE) is that a MLE attains the Cramér-Rao bound at least asymptotically.
More specifically, let :
* p(x, q) be a probability distribution,
* {q* n} a sequence of MLE of q,
then, under some regularity conditions, it can be shown that :
n1/2(q* - q) converges in distribution to N(0, I1-1)
So a MLE is not only consistent and asymptotically normally distributed, it is also asymptotically efficient.
The Cramér-Rao inequality may be generalized in several different ways :
So far we considered unbiased estimators only because of their natural attractiveness. But we might have considered the more general issue of a lower bound for the Mean Square Error (MSE) of a biased estimator.
Let q* be a (possibly) biased estimator of q.
The reader can just introduce some small changes to the demonstration of the "unbiased" case of the Cramér-Rao inequality to obtain :

If the estimator turns out to be unbiased, d/dq E[q*] = 1, and we are back to the unbiased case.
But we can go one step further. Recall that the MSE of an estimator is :
MSE = Var + Bias²
and it is then possible to find a lower bound to the MSE of a biased estimator. We have then :

where " ' " denotes the differentiation with respect to q.
Instead of estimating the parameter q, one may be interested by estimating some function of q : for rxample, on may want to estimate a standard deviation instead of a variance.
So let g(q) be the function of the parameter q to be estimated. Another slight modification of the demonstration then leads to the following result :

The most general form of the Cramér-Rao for the MSE of :
* A biased estimator of bias b(q),
* Of a function g(q) of the parameter q
is then :
|
|
___________________________________________________________________
|
Tutorial 1 |
In this first Tutorial :
* We establish the "basic" expression of the Cramér-Rao inequality, which involves only the score variance.
* We then calculate the score variance as a function of p(x, q), which leads to the first form of the Cramér-Rao inequality.
* We establish the second form of the Cramér-Rao inequality.
We conclude by making explicit the regularity conditions needed in the foregoing demonstrations.
THE CRAMER-RAO LOWER BOUND
|
The "score" function Definition of the score Expectation of the score The Cramér-Rao inequality Rationale and outline of the demonstration Covariance of the score and an unbiased estimator The "basic" Cramér-Rao inequality Variance of the score First form of the Cramér-Rao inequality Second form of the Cramér-Rao inequality Regularity conditions Fixed integration limits Variable integration limits |
||
|
TUTORIAL |
||
_______________________________________________________
|
Tutorial 2 |
We now derive a necessary and sufficient condition for a probability distribution p(x, q) to possess an unbiased estimator of q that attains the Cramér-Rao lower bound (efficient estimator), and calculate a useful expression for the variance of this estimator.
This condition is easily generalized to a function of the parameter, and well then establish that there is at most one function of q that can be efficiently estimated. If this function is not the "identity" function, then q has no efficient estimator.
-----
We then establish two important consequences of this condition :
* If the parameter q of p(x, q) has an efficient estimator, this estimator is equal to the (then unique) Maximum Likelihood estimator of q.
* If there exists a function g(q) admitting an efficient estimator, then this estimator is sufficient statistic for q.
EFFICIENT ESTIMATORS
|
Necessary and sufficient condition for the Cramér-Rao bound to be attained The condition is both necessary and sufficient Variance of the efficient estimator Efficient estimation of a function of the parameter Efficiency and Maximum Likelihood Efficiency and Sufficient statistic The Factorization Theorem revisited An efficient estimator is a sufficient statistic |
||
|
TUTORIAL |
||
____________________________________________________________________________
|
Tutorial 3 |
We now review some applications of the Cramér-Rao lower bound to determine whether some classical estimators are, or are not efficient. We'll do that for continuous distributions (normal for mean and variance, exponential for mean) as well as for some discrete distributions (Bernoulli, Poisson). For this purpose, we'll use the first and the second form of the Cramér-Rao inequality, as well as the necessary and sufficient condition for the lower bound to be attained.
-----
We then spend some time on the uniform distribution U[0, q]. We show that the regularity conditions needed for establishing the Cramér-Rao inequality are not satisfied, and we verify directly that some of the calculations involved in establishing the Cramér-Rao lower bound are then unjustified.
We further identify an unbiased estimator of q based on the highest order statistic of the sample and show that its variance is smaller than the (meaningless) Cramér-Rao "lower bound".
EXAMPLES OF APPLICATIONS
OF THE CRAMER-RAO LOWER BOUND
|
Mean of the normal distribution Cramér-Rao lower bound The sample mean is efficient Variance of the normal distribution The bound can be attained The distribution mean is known The distribution mean is unknown Mean of the exponential distribution Cramér-Rao lower bound The sample mean is efficient Parameter of the Bernoulli distribution Cramér-Rao lower bound The sample mean is efficient Second method : N and S condition for efficiency Poisson distribution Cramér-Rao lower bound The sample mean is efficient Uniform distribution The regularity conditions are not fulfilled Information in the sample An unbiased estimator whose variance is smaller than the CR "lower bound" The unbiased estimator Variance of the estimator |
||
|
TUTORIAL |
||
_________________________________________________________
Related readings:
|
|
Firefox and Greek characters |