Tutorials

Cramér-Rao lower bound

The Cramér-Rao lower bound is a limit to the variance that can be attained by an unbiased estimator of a parameter θ of a distribution.

Unbiased estimators

Let θ be a parameter of a distribution. Recall that an estimator θ* of θ is said to be "unbiased" if its expectation is equal to θ for all values of θ:

E[θ*] = θ 

 

What is really expected from an estimator is to have a low Mean Squared Error (MSE), and this does not demand that the estimator be unbiased. But unbiased estimators enjoy a great popularity because they are easier to study than estimators that might have a lower MSE, but that are biased. In particular, the MSE of an unbiased estimator is just its variance.

A parameter may have several (or even an infinity) of unbiased estimators. For example, the sample mean and the sample median are both unbiased estimators of the normal distribution mean, and it is known that the variance of the sample mean is then lower than the variance of the sample median. So, in this particular case, the sample mean is to be prefered to the sample median as an estimator of the distribution mean.

 

One may therefore ask :

Given a parameter θ of a distribution, what is the smallest variance that can be attained by an unbiased estimator of θ ?

 

The answer is given, in part, by the Cramér-Rao lower bound that we now describe.

The "score" function

Definition of the score function

Let p(x; θ) be a probability distribution (either discrete or continuous) that depends on the parameter θ. Let's draw a n-sample X = (x1, x2, ..., xn) from

p(x; θ). The Likelihood L of the sample is :

L = Πi p(xi; θ

 

As usual, we'll rather consider the log-likelihood LL :

LL = log(L) = Σi log( p(xi; θ))

 

LL depends on θ, and as a matter of fact, the basic idea of Maximum Likelihood estimation is to identify the value of θ that will make the sample (log-) likelihood largest.

If p(x; θ) is regular enough, then LL is a differentiable function and its derivative is called the score of the sample. We'll denote it s(X, θ).

So, by definition :

  

Maximum Likelihood estimation therefore tries to identify the value of θ  that makes the score of the sample equal to 0.

Interpretation of the score

The value of the score of a sample is a measure of the sensitivity of the sample log-likelihood to small changes of the value of θ. If the value of the score is small for a given value of θ, the likelihood of the sample (that is, its probability density) will be essentially unaffected by small changes of θ

    * Some of the observations in the sample will see their contributions to the log-likelihood increase,

    * While the other observations will see their contributions to the log-likelihood decrease,

with an overall variation of the log-likelihood just about 0.

So observations can't agree among themselves as to how θ should be changed to increase the likelihood of the sample.

 

One can therefore fairly conclude that the sample has little to say about the value of θ.


Although this line of thinking is reminiscent of that leading to Maximum Likelihood estimation, we are not within the context of Maximum Likelihood estimation. Our goal is not to maximize the likelihood.

The score is a random variable

For a given value of θ, the score depends on the sample and is therefore a random variable. What are the properties of this r.v. ?

We'll show that, under some regularity conditions, the score is centered (expectation is 0) :

E[s(X; θ)] = 0

 

What about its variance Var(s(X; θ)) ?

Suppose that this variance is very small for a given value of θ. This means that almost all the samples have a score value near 0 (the expectation of the score), and therefore that almost all the samples contain little information about the true value of θ (see above).

Now consider θ*, any unbiased estimator of θ. If the variance of the score is very small, and therefore almost all the samples contain only little information about the true value of θ, one certainly cannot expect θ* to do a good job (i.e. have a low variance) at estimating this value : we should expect the variance of θ* to be large.

 

One may therefore anticipate a negative relationship between :

    * The variance of the score,  and

    * The variance of any unbiased estimator of θ.

Cramér-Rao lower bound

The "basic" form

This intuition is justified. We'll show that under certain regularity conditions, any unbiased estimator θ* of θ  is such that :

 

This expression establishes a lower bound to the variance that can be attained by an unbiased estimator of θ.

First "operational" form

Although quite simple, the above expression cannot be used as is because Var(s(X; θ)) is unknown. We therefore have to calculate Var(s(X, θ)) as a function of what is known, that is p(x; θ). This calculation is detailed in the Tutorial below. It will lead us to the first operational form of the Cramér-Rao inequality :

 

Information

The denominator is called the (Fisher's) information In(θ) of the distribution for n-samples. The reason for this name is clear : the larger the "information" (i.e. about the value of the parameter), the more accurate will be the predictions of an unbiased estimator attaining the Cramér-Rao lower bound (see below).

It is therefore common to see the Cramér-Rao inequality written as :

 

Second "operational" form

Under certain regularity conditions that we'll detail, the Cramér-Rao inequality may also be given another operational form :

 

which is sometimes easier to use than the first form, as we'll see in the Tutorial below.

 
Note the "-" sign.

Is the bound attained ? Efficient estimator

The Cramér-Rao lower bound is a limit below which the variance of an unbiased estimator cannot be found (under the proper regularity conditions), but nothing is said so far about whether or not this bound can be attained : there is no clue as to the existence of an unbiased estimator of θ whose variance is equal to the Cramér-Rao lower bound. As a matter of fact, one can exhibit examples where, although the regularity conditions are satisfied, the Cramér-Rao lower bound is equal to 0, and therefore clearly inaccessible.

An unbiased estimator whose variance attains the Cramér-Rao lower bound is said to be efficient.

-----

We'll show that a necessary and sufficient condition for the existence of an unbiased estimator of θ whose variance is equal to the Cramér-Rao lower bound is that the score function can be written as :

 

s(X; θ) = a(θ).[h(X) - θ]

 where :

    * a(θ) is a function of θ only,  and

    * h(X) is a function that depends on the sample only, and not on θ (a statistic).

 

Then (and only then) is θ* = h(X) an unbiased estimator attaining the Cramér-Rao lower bound : it is efficient.

 

We'll show that the variance of θ* is then equal to 1/a(θ).

-----

In fact, the issue of the existence of an efficient estimator must be viewed in a broader context. The "good" question is : "Does there exist a function g(θ) such that there exists an efficient estimator of g(θ) ?".

A simple modification of the demonstration of the above result shows that a necessary and sufficient condition for this double existence is that the score can be written as :

 

s(X; θ) = a(θ).[h(X) - g(θ)]

 

 

h(X) is then an efficient estimator of g(θ), whose variance is equal to g'(θ)/a(θ).

-----

From this result, we'll deduce that there is at most one function g(θ) that can be efficiently estimated. Consequently, if there exists a function g(θ) verifying the above condition, and if this function is not the identity function, then θ has no efficient estimator.

Uniqueness

If θ* is an efficient estimator of g(θ), then it is the only efficient estimator of g(θ) : the variance of any other unbiased estimator of g(θ) will be larger than the variance of θ* :

 

If g(θ) has an efficient estimator θ*, then θ* is the only efficient estimator of g(θ).

 

 

This property is not specific to efficient estimators. It is true for any Uniformly Minimum Variance Unbiased Estimator, and is demonstrated here.

Efficiency, Sufficiency and Exponential family

Let p(x; θ) be a probability distribution.

We'll show that :

 

If there exists a function g(θ) admitting an efficient estimator, then this estimator is a sufficient statistic for θ.

 

 

The converse is just a little bit more complicated. We'll show that under some additional regularity conditions, if the parameter θ of the family of distributions

p(x; θ) admits a sufficient statistic, then p(x; θ) is necessarilty an exponential family. We'll further deduce that there then exists a single function g(θ) that can be efficiently estimated. This function will be clearly identified, as well as its efficient estimator.

Efficient estimator and Uniformly Minimum Variance Unbiased Estimator (UMVUE)

    * An efficient estimator is clearly a Uniformly Minimum Variance Unbiased Estimator (UMVUE). But the converse is false : a UMVUE is not always efficient, as shown here.

The concept of efficient estimator is thus stronger than the concept of UMVUE. This reinforcement goes along with a similar reinforcement of the link between these estimators and the concept of sufficient statistic :

        - A UMVUE is just a function of a sufficient statistic,

        - Whereas an efficient estimator is a sufficient statistic, as we just saw.

_____

    * If the regularity conditions needed to derive the Cramér-Rao lower bound are not fulfilled, the quantities in the Cramer-Rao inequality may sometimes still be calculated, but there may then exist unbiased estimators whose variances are smaller than the (then meaningless) Cramér-Rao "lower bound". We'll see that such is the case with the uniform distribution.

Efficiency and Maximum Likelihood

There is close relationship between :

    * Efficient etimators,  and

    * Maximum Likelihood estimators.

 

    a) We'll show that if an unbiased estimator θ* of θ attains the Cramér-Rao lower bound (efficient estimator), then it is equal to the (then unique) Maximum Likelihood estimator θ*ML of θ.

 

θ* efficient      θ* = θ*ML 

 

 

The converse is not true : a Maximum Likelihood estimator is not necessarily efficient. But we state now that it is nevertheless asymptotically efficient.

 

    b) One of the most important properties of Maximum Likelihood estimators (MLE) is that a MLE attains the Cramér-Rao bound at least asymptotically.

More specifically, let :

    * p(x; θ) be a probability distribution,

    * {θ*n} a sequence of MLE of θ,

 

then, under some regularity conditions, it can be shown that :

n1/2(θ* - θ)     converges in distribution to    N(0, I1-1)

So a MLE is not only consistent and asymptotically normally distributed, it is also asymptotically efficient.

Generalizations

The Cramér-Rao inequality may be generalized in several different ways :

Biased estimators

So far we considered unbiased estimators only because of their natural attractiveness. But we might have considered the more general issue of a lower bound for the Mean Square Error (MSE) of a biased estimator.

Let θ* be a (possibly) biased estimator of θ.

The reader can just introduce some small changes to the demonstration of the "unbiased" case of the Cramér-Rao inequality to obtain :

 

If the estimator turns out to be unbiased,  d/dθE[θ*] = 1, and we are back to the unbiased case.

But we can go one step further. Recall that the MSE of an estimator is :

MSE = Var + Bias²

and it is then possible to find a lower bound to the MSE of a biased estimator. We have then :

 

where " ' " denotes the differentiation with respect to θ.

Function of the parameter

Instead of estimating the parameter θ, one may be interested by estimating some function of θ : for example, on may want to estimate a standard deviation instead of a variance.

So let g(θ) be the function of the parameter θ to be estimated. Another slight modification of the demonstration then leads to the following result :

The most general form

The most general form of the Cramér-Rao for the MSE of :

    * A biased estimator of bias b(θ),

    * Of a function g(θ) of the parameter θ 

is then : 

 

 

___________________________________________________________________

 

 

Tutorial 1

 

In this first Tutorial :

    * We establish the "basic" expression of the Cramér-Rao inequality, which involves only the score variance.

    * We then calculate the score variance as a function of p(x; θ), which leads to the first form of the Cramér-Rao inequality.

    * We establish the second form of the Cramér-Rao inequality.

 

We conclude by making explicit the regularity conditions needed in the foregoing demonstrations.

 

 

 

THE CRAMER-RAO LOWER BOUND

The "score" function

Definition of the score

Expectation of the score

The Cramér-Rao inequality

Rationale and outline of the demonstration

Covariance of the score and an unbiased estimator

The "basic" Cramér-Rao inequality

Variance of the score

First form of the Cramér-Rao inequality

Second form of the Cramér-Rao inequality

Regularity conditions

Fixed integration limits

Variable integration limits

TUTORIAL

_______________________________________________________

 

 

Tutorial 2

 

We now derive a necessary and sufficient condition for a probability distribution p(x; θ) to possess an unbiased estimator of θ that attains the Cramér-Rao lower bound (efficient estimator), and calculate a useful expression for the variance of this estimator.

This condition is easily generalized to a function of the parameter, and well then establish that there is at most one function of θ that can be efficiently estimated. If this function is not the "identity" function, then θ has no efficient estimator.

-----

We then establish two important consequences of this condition :

    1) If the parameter θ of p(x; θ) has an efficient estimator, this estimator is equal to the (then unique) Maximum Likelihood estimator of θ. We give two demonstrations of this important result :

        - The first one is a natural extension of the theory of the Cramér-Rao lower bound.
        - The second one rests on the fact that a necessary (but not sufficient) condition for a distribution p(x; θ) to admit an efficient estimator for θ is that the distribution belongs to the exponential family.

    2) If there exists a function g(θ) admitting an efficient estimator, then this estimator is a sufficient statistic for θ.

 

 

 

EFFICIENT ESTIMATORS

Necessary and sufficient condition for the Cramér-Rao bound to be attained

The condition is both necessary and sufficient

Variance of the efficient estimator

Efficient estimation of a function of the parameter

Efficiency and Maximum Likelihood

First demonstration

Second demonstration

Score function in the exponential family

Maximizing the log-likelihood

The efficient estimator is the unique MLE

Efficiency and Sufficient statistic

The Factorization Theorem revisited

An efficient estimator is a sufficient statistic

TUTORIAL

____________________________________________________________________________

 

 

Tutorial 3

 

We now review some applications of the Cramér-Rao lower bound to determine whether some classical estimators are, or are not efficient. We'll do that for continuous distributions (normal for mean and variance, exponential for mean) as well as for some discrete distributions (Bernoulli, Poisson). For this purpose, we'll use the first and the second form of the Cramér-Rao inequality, as well as the necessary and sufficient condition for the lower bound to be attained.

-----

We then spend some time on the uniform distribution U[0, θ]. We show that the regularity conditions needed for establishing the Cramér-Rao inequality are not satisfied, and we verify directly that some of the calculations involved in establishing the Cramér-Rao lower bound are then unjustified.

We further identify an unbiased estimator of θ based on the highest order statistic of the sample and show that its variance is smaller than the (meaningless) Cramér-Rao "lower bound".

-----

We calculate here the Cramér-Rao lower bound for unbiased estimators of the parameter λ of the exponential distribution Exp(λ). We'll then deduce that the UMVUE for λ is not efficient.

 

 

 

EXAMPLES OF APPLICATIONS
OF THE CRAMER-RAO LOWER BOUND

Mean of the normal distribution

Cramér-Rao lower bound

The sample mean is efficient

Variance of the normal distribution

The bound can be attained

The distribution mean is known

The distribution mean is unknown

Mean of the exponential distribution

Cramér-Rao lower bound

The sample mean is efficient

Parameter of the Bernoulli distribution

Cramér-Rao lower bound

The sample mean is efficient

Second method : N and S condition for efficiency

Poisson distribution

Cramér-Rao lower bound

The sample mean is efficient

Uniform distribution

The regularity conditions are not fulfilled

Information in the sample

An unbiased estimator whose variance is smaller than the CR "lower bound"

The unbiased estimator

Variance of the estimator 

TUTORIAL

 

_________________________________________________________

 

Related readings :

Estimation

Mean Square Error

Uniformly Minimum Variance Unbiased Estimator (UMVUE)

Exponential family

Maximum Likelihood

Sufficient statistic

Download this Glossary