Quite generally, the term "bias" refers to a systematic (i.e. not random) difference between a quantity and a predition of this quantity.
Let θ* be an estimator of a parameter θ of a probability distribution. θ* is said to be unbiased if its expectation is equal to the value of the parameter :
E[θ*] = θ
In other words, the value of an unbiased estimator is, on the average, equal to the value to be estimated.
If the expectation of the estimator is different from the value of the parameter, the estimator is said to be "biased", and its bias is then defined as the difference between the expectation of the estimator and the true value of the parameter :
Bias = E[θ*] - θ
The value of the bias depends, in general, on the value of the parameter.
The lack of bias is clearly an attractive feature of an estimator, but it is not a fundamental one. The quality of an estimator, for a given sample size, is rather measured by its Mean Squared Error (MSE), the expected value of the squared difference between the estimator and the true value of the parameter to be estimated :
MSE = E[(θ* - θ)²]
It can be shown that :
MSE = Var(θ*) + Bias(θ*)²
The MSE of an unbiased estimator is then just its variance.
This expression suggests that a lack of bias (Bias(θ*)² = 0) is necessary to obtain a low MSE, but things are a bit more complex : it is sometimes possible to introduce a certain amount of bias into a naturally unbiased estimator and discover that as a result, the variance of the estimator has decreased so much that the overall MSE of the estimator has decreased. A (controled) bias is then a favorable feature of the estimator.
This fine tuning of the balance between bias and variance for the purpose of obtaining a low MSE is called the bias-variance tradeoff.
* An example pertaining to the estimation of the variance of the normal distribution is found here.
* Another example is given in the animation and the Tutorial below.
* Ridge Regression is the most famous example of introduction of a bias into a naturally unbiased model for the purpose of reducing the MSE of its estimated parameters and of its predictions.
It is sometimes possible (but not necessarily a good idea) to remove the bias of a biased estimator. The most classical example is that of the estimator of a variance :
* The estimator obtained by the method of moments :
s² = 1/n.Σi (Xi - )² with = 1/n.Σi Xi
is biased ;
* Whereas the "corrected" estimator :
S ² = 1/(n - 1) .Σi (Xi - )²
is unbiased (lower image of the above illustration).
Unfortunately, theoretical results about low MSE estimators are few and far between, whereas the theory of unbiased estimation is rich in deep and far reaching results. Unbiased estimators have therefore been studied extensively.
The most significant result is that in many instances a function g(θ) of a parameter θ admits a unique unbiased estimator that is better (lower variance) than any other unbiased estimator of θ, this being true for any value of θ in its range. When it exists, such an estimator is called a "Uniformly Minimum Variance Unbiased Estimator" (UMVUE).
In addition, there exists powerful methods for identifying UMVUEs when they exist.
It is a general and natural tendency to focus on unbiased estimators, if only because they are intuitively appealing and mathematically tractable.
Yet, unbiasedness is no miracle cure :
* First, it is irrelevant. What the analyst wants is an estimator delivering estimates that are most often close to the true value of the quantity to be estimated, that is, a low MSE but not necessarily unbiased estimator.
* Unbiased estimators do not always exist.
* Unbiased estimators may sometimes have nearly pathological behaviors.
These points are further developped in the Tutorial below.
This animation illustrates the concept of "unbiasedness" and its limitations.
The animation studies three estimators of the parameter θ of the uniform distribution U[0, θ].
The upper frame displays the segment [0, θ] as well as a sample drawn from the distribution U[0, θ].
Each time you click on "Next", a new sample is drawn from the distribution.
Gray background frames
These three frames will display the empirical histograms of the distributions of three estimators of θ. At the opening of the animation, they only display the theoretical distributions of these estimators.
Twice the sample mean
T1 = 2 is a quite natural candidate for estimating θ. We'll show that it is unbiased, and we'll calculate its variance.
Its distribution has no mathematically closed form and is not displayed. Only the mean and standard deviation of this distribution are displayed at the bottom of the frame.
Hanging from the top of the frame is the current estimate.
In fact, the UMVUE T2 of θ is known and is calculated here. We'll calculate its variance and show that it is less than the variance of T1. In fact, we'll show that as the sample size grows ("Sample size" buttons), its advantage over T1 becomes overwhelming : T1 is therefore a very poor unbiased estimator of θ.
The theoretical distribution of T2 is displayed in the middle gray background frame.
We'll show that introducing a controled amount of bias into T2 creates a new estimator T3 whose variance is substantially reduced, and whose MSE is then lower than that of T2 (i.e. its variance).
"Lowest MSE" is a misnomer : T3 is only the lowest MSE estimator of the three displayed estimators. It is not claimed that it is the absolute lowest MSE estimator of θ (assuming there is one, which we don't know).
Click on "Go" and observe the progressive build-up of the empirical histograms of the distributions of the three estimators.
Observe that T1 and T2 are unbiased, whereas T3 is clearly biased.
Observe that the variance of T2 is less that that of T1.
The right part of the animation displays graphically the MSEs of the three estimators.
* T1 is clearly the worst of the three, and all the more so that the sample is large.
* T3 is better than T2 for all sample sizes. The advantage of T3 over T2 is more pronounced for small samples. Admittedly this advantage is never breathtaking, but the goal of this part of the animation is only to illustrates the fact that a UMVUE may not be the best calculable estimator.
At each and every point of the space of the input variables, the role of a model is to make a prediction of the value of a certain quantity (e.g. the value of the response variable in predictive models). As the fitted model depends on the sample, this prediction is a random variable, that is used as an estimator of the quantity whose value is to be predicted.
This estimator may be :
* Unbiased (e.g. Simple Linear Regression under the standard conditions).
* But it may also be biased (see Ridge Regression, and the first part of the animation on the bias-variance tradeoff).
One may also consider the average bias over the range of the input variables, ponderated by the joint probability density of these variables. This quantity measures the global, or average bias of the model, that is, its ability to accommodate the shape of the deterministic part of the proces that generated the data (lower image of this illustration) :
The reader may refer to the second part of the animation on the bias-variance tradeoff.
In this Tutorial, we examine some of the shortcomings of unbiased estimators in order to counter the blind faith in the virtues of unbiasedness.
* We first examine a few cases where no unbiased estimator exist for the function of the parameter under consideration.
* We then meet again an old
acquaintance : the UMVUE of e-aλ of the parameter λ of
the Poisson(λ) distribution (whose expression we already established
here and here).
We'll see that for small samples, this estimator has a rather irregular behavior
that fortunately becomes tame when the sample size exceeds a certain
We'll take this opportunity to establish the expression of this UMVUE by a third method (Corollary of the Lehmann-Scheffé theorem).
* We'll identify a biased estimator of the variance of the Bernoulli distribution with a lower MSE than its UMVUE identified here.
* We'll finally study in some detail the case of the estimation of the parameter θ of the uniform distribution U[0, θ] (see animation above). We'll see that a plausible unbiased estimator (twice the sample mean) is in fact a very poor estimator, and that a biased estimator can be calculated which has a lower MSE than the UMVUE of θ.
LIMITATIONS OF UNBIASED ESTIMATORS
Unbiased estimators do not always exist
Reciprocal 1/p of the parameter
Log-odds log[p/(1 - p)]
Reciprocal of the parameter of the Poisson distribution
An unbiased estimator may have irregular behaviors
UMVUE of exp(-aλ) of the Poisson(λ) distribution
A biased estimator may be better than a UMVUE
Variance of the Bernoulli distribution
Unform distribution U[0, θ ]
Twice the sample mean
Variance of the UMVUE of θ
A lower MSE estimator
Mean and bias
Related readings :