Bias

Quite generally, the term "bias" refers to a systematic (i.e. not random) difference between a quantity and a predition of this quantity.

# Bias of an estimator

Let θ* be an estimator of a parameter θ of a probability distribution. θ* is said to be unbiased if its expectation is equal to the value of the parameter :

E[θ*] = θ

In other words, the value of an unbiased estimator is, on the average, equal to the value to be estimated.

If the expectation of the estimator is different from the value of the parameter, the estimator is said to be "biased", and its bias is then defined as the difference between the expectation of the estimator and the true value of the parameter :

Bias = E[θ*] - θ

The value of the bias depends, in general, on the value of the parameter.

## Bias and Mean Squared Error

The lack of bias is clearly an attractive feature of an estimator, but it is not a fundamental one. The quality of an estimator, for a given sample size, is rather measured by its Mean Squared Error (MSE), the expected value of the squared difference between the estimator and the true value of the parameter to be estimated :

MSE = E[(θ* - θ)²]

It can be shown that :

MSE = Var(θ*) + Bias(θ*

The MSE of an unbiased estimator is then just its variance.

-----

This expression suggests that a lack of bias (Bias(θ*)² = 0) is necessary to obtain a low MSE, but things are a bit more complex : it is sometimes possible to introduce a certain amount of bias into a naturally unbiased estimator and discover that as a result, the variance of the estimator has decreased so much that the overall MSE of the estimator has decreased. A (controled) bias is then a favorable feature of the estimator.

This fine tuning of the balance between bias and variance for the purpose of obtaining a low MSE is called the bias-variance tradeoff.

* An example pertaining to the estimation of the variance of the normal distribution is found here.

* Another example is given in the animation and the Tutorial below.

* Ridge Regression is the most famous example of introduction of a bias into a naturally unbiased model for the purpose of reducing the MSE of its estimated parameters and of its predictions.

## Removing the bias of an estimator

It is sometimes possible (but not necessarily a good idea) to remove the bias of a biased estimator. The most classical example is that of the estimator of a variance :

* The estimator obtained by the method of moments :

s² = 1/n.Σi (Xi)²        with        = 1/n.Σi Xi

is biased ;

* Whereas the "corrected" estimator :

S ² = 1/(n - 1) .Σi (Xi

is unbiased (lower image of the above illustration).

## Uniformly Minimum Variance Unbiased Estimator (UMVUE)

Unfortunately, theoretical results about low MSE estimators are few and far between, whereas the theory of unbiased estimation is rich in deep and far reaching results. Unbiased estimators have therefore been studied extensively.

The most significant result is that in many instances a function g(θ) of a parameter θ admits a unique unbiased estimator that is better (lower variance) than any other unbiased estimator of θ, this being true for any value of θ in its range. When it exists, such an estimator is called a "Uniformly Minimum Variance Unbiased Estimator" (UMVUE).

In addition, there exists powerful methods for identifying UMVUEs when they exist.

## Shortcomings of unbiased estimators

It is a general and natural tendency to focus on unbiased estimators, if only because they are intuitively appealing and  mathematically tractable.

Yet, unbiasedness is no miracle cure :

* First, it is irrelevant. What the analyst wants is an estimator delivering estimates that are most often close to the true value of the quantity to be estimated, that is, a low MSE but not necessarily unbiased estimator.

* Unbiased estimators do not always exist.

* Unbiased estimators may sometimes have nearly pathological behaviors.

-----

These points are further developped in the Tutorial below.

# Animation

This animation illustrates the concept of "unbiasedness" and its limitations.

 The "Book of Animations" on your computer

 The animation studies three estimators of the parameter θ of the uniform distribution U[0, θ]. Upper frame The upper frame displays the segment [0, θ] as well as a sample drawn from the distribution U[0, θ]. Each time you click on "Next", a new sample is drawn from the distribution. Gray background frames These three frames will display the empirical histograms of the distributions of three estimators of θ. At the opening of the animation, they only display the theoretical distributions of these estimators. Twice the sample mean T1 = 2 is a quite natural candidate for estimating θ. We'll show that it is unbiased, and we'll calculate its variance. Its distribution has no mathematically closed form and is not displayed. Only the mean and standard deviation of this distribution are displayed at the bottom of the frame. Hanging from the top of the frame is the current estimate. UMVUE In fact, the UMVUE T2 of θ is known and is calculated here. We'll calculate its variance and show that it is less than the variance of T1. In fact, we'll show that as the sample size grows ("Sample size" buttons), its advantage over T1 becomes overwhelming : T1 is therefore a very poor unbiased estimator of θ. The theoretical distribution of T2 is displayed in the middle gray background frame. Lowest MSE We'll show that introducing a controled amount of bias into T2 creates a new estimator T3 whose variance is substantially reduced, and whose MSE is then lower than that of T2 (i.e. its variance). "Lowest MSE" is a misnomer : T3 is only the lowest MSE estimator of the three displayed estimators. It is not claimed that it is the absolute lowest MSE estimator of θ (assuming there is one, which we don't know). Animation Click on "Go" and observe the progressive build-up of the empirical histograms of the distributions of the three estimators. Observe that T1 and T2 are unbiased, whereas T3 is clearly biased. Observe that the variance of T2 is less that that of T1. MSEs The right part of the animation displays graphically the MSEs of the three estimators.     * T1 is clearly the worst of the three, and all the more so that the sample is large.     * T3 is better than T2 for all sample sizes. The advantage of T3 over T2 is more pronounced for small samples. Admittedly this advantage is never breathtaking, but the goal of this part of the animation is only to illustrates the fact that a UMVUE may not be the best calculable estimator. ----------------1) The vertical scale is the same for all three displays, but changes with the sample size so that the displays always have approximately the same height.2) The horizontal scale of the MSE display changes with the sample size so that the longest bar always has approximately the same length.

# Bias of a model

## Local bias

At each and every point of the space of the input variables, the role of a model is to make a prediction of the value of a certain quantity (e.g. the value of the response variable in predictive models). As the fitted model depends on the sample, this prediction is a random variable, that is used as an estimator of the quantity whose value is to be predicted.

This estimator may be :

* Unbiased (e.g. Simple Linear Regression under the standard conditions).

* But it may also be biased (see Ridge Regression, and the first part of the animation on the bias-variance tradeoff).

## Global bias

One may also consider the average bias over the range of the input variables, ponderated by the joint probability density of these variables. This quantity measures the global, or average bias of the model, that is, its ability to accommodate the shape of the deterministic part of the proces that generated the data (lower image of this illustration) :

The reader may refer to the second part of the animation on the bias-variance tradeoff.

___________________________________________________________________________

 Tutorial

In this Tutorial, we examine some of the shortcomings of unbiased estimators in order to counter the blind faith in the virtues of unbiasedness.

* We first examine a few cases where no unbiased estimator exist for the function of the parameter under consideration.

* We then meet again an old acquaintance : the UMVUE of e-aλ of the parameter λ of the Poisson(λ) distribution (whose expression we already established here and here). We'll see that for small samples, this estimator has a rather irregular behavior that fortunately becomes tame when the sample size exceeds a certain threshold.
We'll take this opportunity to establish the expression of this UMVUE by a third method (Corollary of the Lehmann-Scheffé theorem).

* We'll identify a biased estimator of the variance of the Bernoulli distribution with a lower MSE than its UMVUE identified here.

* We'll finally study in some detail the case of the estimation of the parameter θ of the uniform distribution U[0, θ] (see animation above). We'll see that a plausible unbiased estimator (twice the sample mean) is in fact a very poor estimator, and that a biased estimator can be calculated which has a lower MSE than the UMVUE of θ.

LIMITATIONS OF UNBIASED ESTIMATORS

 Unbiased estimators do not always exist Binomial distribution Reciprocal 1/p of the parameter Log-odds log[p/(1 - p)] Reciprocal of the parameter of the Poisson distribution An unbiased estimator may have irregular behaviors UMVUE of exp(-aλ) of the Poisson(λ) distribution Small samples Large samples A biased estimator may be better than a UMVUE Variance of the Bernoulli distribution Unform distribution U[0, θ ] Twice the sample mean Expectation Variance Variance of the UMVUE of θ A lower MSE estimator Mean and bias Variance MSE TUTORIAL

______________________________________________________

 Estimation Bias-Variance tradeoff Cramér-Rao lower bound Rao-Blackwell theorem Lehmann-Scheffé theorem Uniformly Minimum Variance Unbiased Estimator (UMVUE)