Order statistics

# Definition of order statistics

Let X be a random variable with a probability density. If several samples of size n are drawn from this distribution, the position of the leftmost observation will vary from sample to sample : it is a random variable.

More generally, for a given n, the random variable X(k) is defined as the value of the kth observation from the left, and is called the "kth order statistics" of the original variable X.

# Distribution of order statistics

Let's draw n-samples from a population :

* With probability density f(x),

* And distribution function F(x).

We'll show that the probability density of the kth order statistic is :

This expression looks complicated, but in fact its structure and interpretation are simple.

1) The first factor (fraction) is just a normalization factor that makes the integral of the density f(k)(x) equal to 1.

2) In the general case (k > 1 and k < n), the term :

F[(x)]k - 1.[1 - F(x)]n - k

is bell-shaped, as it is the product of a monotone increasing function and a monotone decreasing function. It is 0 at both ends of the range. We let the reader show that the (unique) top of the curve is defined by :

F(x) = (k - 1)/(n - 1)

3) So the density f(k)(x) is the product of f(x) by a bell-shaped curve that constrains the order statistic to spend most of its time around in a region defined by its rank (and by the sample size). In a narrow region around the top of the bell-shaped curve, the density of the order statistic is an almost faithfull image of f(x) (with a multiplication factor).

# Animation

The following interactive animation illustrates the distribution of Order Statistics.

 The "Book of Animations" on your computer

Upper frame

 The upper frame is labeled "Density" and displays by default a uniform distribution (in green). But you may change this distribution by clicking repetitively anywhere inside the frame : every new click modifies the current density. You may return to the uniform density by clicking on "Reset".

Lower frame

 The lower frame is labled "Order statistics" and displays the probability density function of the kth order statistics of the chosen distribution. The shape of this density depends on :     * The sample size (that can be changed with the "Sample size" buttons).     * The rank k of the selected order statistics (that can be changed with the "Rank" buttons). The vertical scale is arbitrary and adjusted so that the height of the mode is always the same for any density, sample size and rank. __________________ * In the case of the uniform distribution, the probability density functions of the order statistics are Beta distributions. Extreme order statistics have power distributions (that are also Beta distributions).When there are only two observations (Sample size = 2), the densities of the two order statistics are linear.   * The probability density function of an order statistic of an arbitrary distribution is given above.     - Except for extreme (smallest and largest) order statistics, the distribution of an order statististic goes to 0 at both ends of the range. This is clearly because either F(x)k - 1 or [1 - F(x)]n - k then goes to 0.       - Create a pretty heavily modulated density in the upper frame (for example, two humps, one at each end of the range), and scan the domain of the ranks of the order statistics from 1 to n (that is up to the current "Sample size"). Observe that the modulations of the density of the chosen order statistic pretty much duplicate the modulations of the mother density around the area where you would normally expect to find the order statistic. The reason is that the dominant factor in the density function of the order statistic is then f(x) (see above). The statistic is constrained by its adjacent neighbors to remain most of the time in the same narrow area, and its jumping about approximately follows the local pattern of the mother density.

Animation

Select a sample size and a rank. Then click on "Go" and observe the build up of the histogram of the corresponding order statistics.

_____________________________________________

# Examples of order statistics

Special cases of order statistics that are important in practice are:

* The smallest (leftmost) observation  X(1). According to the foregoing result, its probability density is :

* The largest (rightmost) observation  X(n). According to the foregoing result, its probability density is :

-----

Important functions of order statistics are:

* The median, defined as:

• X(n+1)/2                             if n is odd,
• (Xn/2   + X(n+1)/2 )/2         if n is even.

* The sample range, defined as the distance between the two extreme observations:

• X(n) - X(1)

* The sample mid-range, defined as:

• (X(1) + X(n)) / 2

and which is a robust estimate of the central tendency of the distribution of X.

# Order statistics and estimation

Order statistics are frequently encountered in parameter estimation. Here are some examples.

## Uniform distribution U[0, θ]

* We show here that the statistic (n + 1)X(n)/n is an unbiased estimator of θ.

* We show here that X(n) is a minimal sufficient statistic for θ.

## Uniform distribution U[θ, θ + 1]

* We show here that the statistic T = {X(1), X(n)} is minimal sufficient for θ, and here that it is not complete.

## Location family

* We show here that the difference between two order statistics of any rank in a location family are ancillary statistics.

____________________________________________________________________________

 Tutorial 1

In this Tutorial, we establish the probabiity density function of order statistics of a r.v. with probability density f(x) and distribution function F(x).

We do that in several steps to make the newcomer to this field progressively acquainted with the kind of reasoning commonly encountered when studying order statistics, and which is sometimes perceived as disconcerting on first reading.

* So we start with the simplest case, that of the order statistics of the uniform distribution in [0, 1]. We do it two different ways:

- First by using an intuitive, half-rigourous argument.

- Then by calculating, and then differentiating the distribution function of the order statistics. This approach is more rigorous but does not appeal nearly as much to the imagination.

In both approaches, the binomial distribution B(n, p) plays a central role.

* We then show that the distributions of the order statistics of the uniform distribution Uniform[0, θ] can be obtained from these results by a simple change of variable.

______________
The reason why we opened this topic with the standard uniform distribution is that its density and distribution function are so simple. But the lines of reasoning we used for the uniform distribution can easily be generalized to any distribution with density f(x) and distribution function F(x), the main difference being that notations are now more cumbersome.

So we go over the foregoing calculations and adapt them to the general case without much concern for the heavy notations.

DISTRIBUTIONS OF ORDER STATISTICS

 Uniform distribution [0, 1] The intuitive approach The distribution function approach The distribution function The probability density function Uniform distribution [0, θ] General case Extreme observtions Leftmost observation Rightmost observation The general case : any distribution The intuitive approach The distribution function approach TUTORIAL

______________________________________

 Tutorial 2

This Tutorial focuses on joint probability densities of order statistics.

* We first calculate by two different methods the joint probability density function of the set of all order statistics. As for all problems involving several r.v., this joint probability density contains all the information pertaining to this set of variables. In the case of order statistics, all results pertaining to order statistics can be derived from the joint probability density of all the order statistics.

Yet, these derivations usually involve cumbersome multiple integrals, and order of integration and integration limits have to be handled carefully. We illustrate this point with the joint probability density of the order statistics of the uniform distribution, which is constant and equal to 1/n!. We show as an exercise how to verify that this density does integrate to 1 over the domain defined by x1 < x2 < ...  < xn. This will give us an opportunity for a first contact with the intricacies often met with multiple integrations of a joint probability density.

* We then use the joint pdf of all order statistics for deriving again the pdf of a single order statistic. This pdf will be considered as one of the marginal densities of the joint density.

* In the same spirit, we then use the joint pdf of all order statistics for deriving the joint pdf of two order statistics. This result is fundamental for the solving many problems about order statistics.

* But because obtaining this last result involved handling heavy multiple integrals, we find it relaxing to conclude this Tutorial by calculating again the joint distribution of two order statistics by an elementary, if not quite rigorous, method similar to the one we used in the previous Tutorial for calculating the pdf of a single order statistic.

JOINT DENSITY OF ORDER STATISTICS

 Joint density of all order statistics First method Second method Special case : the uniform distribution The joint density : a universal tool Pdf of an order statistic as a marginal distribution of the joint pdf Joint pdf of two order statistics The joint density of 2 order statistics : elementary method TUTORIAL

____________________________________________________