Ancillary statistic

Definition of an ancillary statistic

This illustration represents the uniform distribution U[θ, θ + 1], as well as a sample {x1, x2, ..., xn} drawn from this distribution. It is obvious, and we'll show that the distribution of the rv XX(n) - X(1) (the difference between the largest and the smallest observations, or "range") does not depend on θ : changing the value of θ merely shifts the distribution and the sample without changing the value of x(2) - x(1), and therefore the distribution of X.

 

 

It is just as clear that the distribution of the difference between two order statistics of any given ranks does not depend on θ either.

So we just exhibited a family of statistics of the distribution U[θ, θ + 1] whose distributions do not depend on the value of the parameter θ.

This concept can be generalized and formalized into the following definition :

 

A statistic of the distribution p(x, θ) is said to be ancillary if its distribution does not depend on θ.

 

Location families

A "location family" is a family of distributions such that the sole purpose of the parameter θ is to define the horizontal position of the distribution along the x axis. U[θ, θ + 1] is clearly a location family. The general probability distribution function of a location family is f(x - θ).

We'll establish that the following statistics of a location family are ancillary :

    * Draw two independent observations from a distribution belonging to a location family. The difference between these two observations is ancillary.

    * The difference between two order statistics of given ranks is ancillary.

    * The sample variance is an ancillary statistic of any location family.

    * For any i, the difference between the ith observation and the sample mean is ancillary.

    * For any i, the difference between the ith order statistic and the sample mean is ancillary.

Scale family

If the sole effect of the parameter θ is to "stretch" a reference distribution around the origin, the family of distributions is called a "scale family". The general form of the pdf of a scale family is f(x/θ). A typical example of a scale family is provided by the exponential distribution Exp(θ) whose pdf is y = (1/θ)e-x/θ.

We'll establish that the following statistics of a scale family are ancillary :

    * Draw two independent observations from a distribution belonging to a scale family : the ratio of these two observations is ancillary. It will follow that if a statistic Z is a function of the ratios X1/Xn, X2 /Xn, ..., Xn - 1/Xn :

Z = f(X1/Xn, X2 /Xn, ..., Xn - 1/Xn)

then Z is ancillary. Of course, the index n may be replaced by any other index i.

    * The ratio of two order statistics of given ranks is ancillary.

    * The ratio of the ith observation and the sample mean is ancillary.

    * The ratio of the ith order statistic and the sample mean is ancillary.

Generalization

These examples are not the only possible ancillary statistics. We'll establish the general following sufficient condition for a statistic to be ancillary :

 

* Let S be a sufficient statistic for θ whose support does not depend on θ.

* Let also T be a statistic.

If S and T are independent, then T is ancillary.

 

Because of the condition on the support of the sufficient statistic, the various families of uniform distributions are not within the scope of this theorem.

The converse is false, but it becomes true if, in addition to being sufficient, the statistic S is also complete. This is Basu's Theorem, which is stated below.

Ancillary statistics and sufficient minimal statistics

A minimal sufficient statistic may still contain a certain amount of "useless" information as far as estimating θ is concerned : it is then not complete. It is then not possible to reduce this extraneous information and still preserve the sufficient nature of the statistic.

It is sometimes possible to make this information visible : it then appears as an ancillary statistic "within" the minimal sufficient statistic. For example, the statistic {X(1), X(n)} is minimal sufficient but not complete for the family of uniform distributions U[θ, θ + 1]. Since the image of a minimal sufficient statistic by a one-to-one function is still minimal sufficient, the statistic {X(n)X(1), X(1)X(n)} is minimal sufficient for U[θ, θ + 1]. But we just saw that (X(n)X(1)) is ancillary. So the statistic {X(1), X(n)} "contains" an ancillary statistic which prevents it from being complete.

Why ancillary statistics ?

Why bother with statistics whose distributions do not depend on θ, and therefore seem to bring no usable information about the value of θ, while parameter estimation is one of the central activities of Statistics ?

Basu's Theorem

It is true that when a complete statistic exists, then ancillary statistics are completely superfluous. This is a consequence of Basu's Theorem :

 

* If S is a complete statistic for the parameter θ,

* And if T is an ancillary statistic for this parameter,

* Then S and T are independent rvs.

 

In other words, a complete statistic is independent of any ancillary statistic.

Basu's Theorem is demonstrated below.

 

So Basu's Theorem confirms the idea according to which :

    * A complete statistic contains all the necessary information for the estimation of θ, and no other information.

    * Whereas an ancillary statistic contains no useful information about the value of θ.

Unfortunately, things are not quite that simple when no complete statistic exist, as explained in the next paragraph.

Minimally sufficient but not complete statistic

Let T be a minimally sufficient, but not complete statistic. Then T still "contains" an ancillary statistic whose value may provide some information about the accuracy of the estimation made by an unbiased estimator.

Consider for example the uniform distribution U[θ - 1, θ  + 1] from which the sample X = {x1, x2, ..., xn} is drawn. It can easily be shown that the mid-range (X(n) - X(1))/2 is an unbiased estimator of θ. The statistic X(n) - X(1) is ancillary, yet suppose that its value is close to 2 (the domain of the distribution). This may be the case only if x(1) is close to θ - 1 and if x(n) is close to θ + 1. But then the mid-range (x(n) - x(1))/2 cannot but be very close to θ, and the value of the estimate is certainly very close to θ.

So, although it is ancillary, the statistic X(n) - X(1) has certainly something useful to say about the estimation of θ. This may be related to the fact that there is no complete statistic for θ.

-----

Although early attempts at clarifying the role of ancillary statistics in estimation date back to 1925, the issue is still debated today and has so far yielded more controversy than ground-breaking results, so we'll not pursue the topic any further.

    __________________________________________________________________

 

 

 

 

Tutorial

 

In this Tutorial, we first give a simple example of a free statistic of the uniform distribution U[θ, θ + 1].

We then generalize this result to location families by giving four examples of free statistics for such families.

Similar results are then obtained for scale families.

These examples do not summarize the concept of free statistic, and we'll establish a general sufficient condition for a statistic to be free.

We then demonstrate Basu's Theorem according to which a complete statistic is independent of any ancillary statistic.

We conclude by giving three examples of applications of Basu's Theorem :

    1) The first one is a about the non completeness of the minimal sufficient statistic of the U[θ, θ + 1] distribution, a result already established here.

    2) The second one is a particularly sophisticated way of establishing the independence of the sample mean and the sample variance of the normal distribution, a classical result already obtained here by using more basic techniques.

    3) The third one is an elegant way of calculating the expectation of the ratio of an observation to the sample mean of the exponential Exp(λ) distribution.


The second demonstration of Basu's Theorem calls on a result in probability theory that we state without proof.

 

 

ANCILLARY STATISTICS

BASU'S THEOREM

Range of the uniform distribution U[θ, θ + 1]

Ancillary statistics of a location family

Difference of two independent observations

Differences of order statistics

Differences with the sample mean

Observation and sample mean

Order statistic and sample mean

Sample variance

Ancillary statistics of a scale family

Ratio of two independent observations

Ratio of order statistics

Ratio to the sample mean

Observation and sample mean

Order statistic and sample mean

A sufficient condition for a statistic to be ancillary

Basu's Theorem

First demonstration

Second demonstration

Examples of application of Basu's Theorem

Non-completeness of a minimally sufficient statistic

Independence of the sample mean and sample variance

of the normal distribution

Variance is known

Variance is unknown

An expectation in the context of the exponential distribution

TUTORIEL

 

______________________________________________________

 

Related readings :

Complete statistic

Order statistics

Download this Glossary