Rao-Blackwell theorem
Let p(x; θ) be a probability distribution. The Rao-Blackwell theorem provides a method for reducing the variance of any unbiased estimator of the parameter θ (or of any function g(θ) of the parameter), provided that a statistic that is sufficient for θ be available.
Let θ* be an unbiased estimator of g(θ).
Following an informal but useful line of thinking often encountered in estimation theory, we'll say that the reason why this estimator is not a Uniformly Minimum Variance Unbiased Estimator (UMVUE) is that it does not carry with it all the information useful for estimating g(θ), but which is nonetheless available in the sample.
Along the same line, a statistic T that is sufficient for θ carries with it all the information in the sample that is useful for estimating g(θ). But a sufficient statistic has no particular reason for being an unbiased estimator.
We can therefore imagine "combining" θ* and T into a new estimator U that would retain the best of both worlds :
* The unbiasedness of θ*,
* And at least some of the information about θ which is present in T but which is missing in θ* by transfering this extra information into the new estimator.
We would then obtain an unbiased estimator of g(θ) that would hopefully be better (lower variance), or at least not worse, than θ*.
-----
The Rao-Blackwell theorem identifies this combination.
Let :
* θ* be an unbiased estimator of g(θ).
* T be a sufficient statistic for θ.
Then the random variable :
|
U = E[θ* | T ] |
1) Is a statistic, that is a function of the sample that does not depend on θ.
2) Its expectation is equal to g(θ), and U is therefore an unbiased estimator of g(θ).
3) Its variance is no larger than the variance of θ*.
4) If its variance happens to be equal to that of θ*, then θ* is a function of T.
Combining as above a sufficient statistic and an unbiased estimator for the purpose of reducing the variance of this estimator is sometimes dubbed "blackwellizing" the estimator. It is called upon when it is believed that the estimator can possibly be improved, for example because its variance is larger than the Cramér-Rao lower bound.
Yet :
* Blackwellization usually leads to cumbersone calculations (as is usually the case when conditional expectations are involved), as we'll see in the Tutorial below.
* The Rao-Blackwell theorem says nothing about the quality of the new estimator, and certainly not that is the best possible unbiased estimator (UMVUE), except when the sufficient statistic used for blackwellizing the unbiased estimator is also complete (Lehmann-Scheffé theorem).
_____________________________________________________________
|
Tutorial 1 |
In this Tutorial, we demonstrate the Rao-Blackwell theorem.
In a first section, we give an intuitive line of reasoning that leads to this particular combination of an unbiased estimator and a sufficient statistic. We then proceed with the demonstration proper.
Introducing a sufficient statistic was suggested on heuristic grounds for the purpose of injecting some additional information about the parameter to be estimated into the new estimator. But it will turn out that the sufficiency of the conditioning statistic has another fundamental role for overcoming an unexpected difficulty : the expectation of an unbiased estimator of θ conditionally to another statistic is usually not a statistic (we'll give a counter example), unless the conditioning statistic is sufficient for θ.
THE RAO-BLACKWELL THEOREM
|
Reducing the variance but preserving the expectation Reducing the variance Preserving the expectation Back to estimation The Rao-Blackwell theorem U is a statistic U is an unbiased estimator The variance of U is no larger than that of the original estimator Equality of the variances and functional relationship Rao-Blackwell and Minimum Variance |
||
|
TUTORIAL |
||
____________________________________________________________
|
Tutorial 2 |
In this Tutorial, we put the Rao-Blackwell Theorem to work on an example.
The problem is as follows :
* A r.v. variable is Poisson distributed with unknown parameter λ. A sample {x1, x2, ..., xn} is drawn from the distribution, and we want to use this sample for estimating the probability for X to be equal to 0.

Since the pmf of the Poisson distribution is :

what we want to estimate is truly e-λ.
We will first examine a "natural" estimator, that we'll reject because of its bias and the intractable calculations associated to its MSE.
We'll then identify a second estimator that will luckily turn out to be unbiased. We'll reduce (with some difficulty) its variance by a "blackwellization" procedure.
Calculations are a bit cumbersome, but we'll be rewarded by the discovery of a good unbiased estimator that would probably have been impossible to identify by a more direct method.
A FIRST EXAMPLE OF BLACKWELLIZATION
|
The problem The natural estimator is biased An unbiased estimator Blackwellizing the estimator The sufficient statistic Blackwellizing the estimator Auxiliary indicator variables The new estimator Variance of the new estimator Comparing variances The new estimator is a UMVUE |
||
|
TUTORIAL |
||
Later, we'll realize that we got more that what we bargained for. We were merely looking for an improved unbiased estimator of e-λ, but we obtained in fact its best possible unbiased estimator (UMVUE). This is because the statistic used for the conditioning part of the Rao-Blackwell theorem is complete : consequently, by the Lehmann-Scheffé theorem, the improved estimator is the unique UMVUE of e-λ.
The UMVUE of e-λ is also obtained
:
1) Here by a different
method (UMVUE of any analytic function of the parameter λ).
2) Here by application
of the Corollary of the Lehmann-Scheffé theorem.
_________________________________________
|
Tutorial 3 |
We now address a second example of blackwellization of an unbiased estimator.
The problem is as follows :
* A random variable follows an exponential distribution with unknown parameter λ. A sample {x1, x2, ..., xn} is drawn from this distribution, and we want to estimate the probability for X to be larger than t :
P{X > t} = ?

So this is a lifetime problem : we want to estimate the probability for a component to live longer than t before breakdown.
Since the cumulative distribution function of the Exp(λ) distribution is F(x) = 1 - e-λx, what we really want to estimate is e-λt for any t.
The problem is slightly more difficult than the preceding one. It leads to an improved estimator that lies out of reach of intuition alone. To our knowledge, the variance of this estimator is unknown at this time.
Yet, as in the previous Tutorial, since the conditioning statistic is complete, we can assert that this improved unbiased estimator is in fact the Uniformly Minimum Variance Unbiased Estimator (UMVUE) of P{X > t} = e-λt.
-----
This Table of Contents shows a similarity in the approach to this problem and that used for the previous problem. We comment on this similarity below.
A SECOND EXAMPLE OF BLACKWELLIZATION
|
The problem The natural estimator is biased An unbiased estimator Blackwellizing the estimator The sufficient statistic Blackwellizing the estimator Auxiliary indicator variables The new estimator The new estimator is a UMVUE |
||
|
TUTORIAL |
||
______________________________________
These two problems belong to a class of problems to which the Rao-Blackwell Theorem brings powerful and original solutions.
They can be expressed as follows :
* One considers a probability distribution p(x; θ).
* A sample {x1, x2, ..., xn} is drawn from this distribution.
* One wants to estimate quantities like :
P{a ≤ X ≤ b}
This probability can be expressed as a function f(a, b, θ). Usually, a good unbiased estimator of θ is available, and it is therefore natural to try to estimate
P{a ≤ X ≤ b} by substituting the estimator of θ for θ into f(.).
Unfortunately :
* The estimator thus obtained is biased.
* Studying its properties (bias, variance, MSE) leads to intractable calculations.
-----
Another approach is to estimate P{a ≤ X ≤ b} by the proportion of observations that lie in the segment [a b]. This "naive" estimator is always unbiased, but it does not take the nature of the distribution into account, and one can therefore suspect that it is rather poor (large variance), and expect that it can be improved. This is what the Rao-Blackwell Theorem does provided that a sufficient statistic for θ is available (as this statistic is then also sufficient for
P{a ≤ X ≤ b}, which is a function of θ).
Most classical distributions can be succesfully approached this way. The improved estimators thus found are always completely beyond guessing, and more often than not, one cannot calculate their variances. But the Rao-Blackwell Theorem asserts that these variances are smaller than the variances of the naive estimators.
___________________________________________________________
Related readings :