Let X be a random variable with a probability density. We keep drawing observations from X and, from time to time, an observation will be larger than all previously drawn observations : this observation is then called a "record", and its value a record value, or, more precisely, an upper record value.
This animation illustrates the concept of "record value".
* The upper frame displays an exponential distribution. It is the distribution from which we are going to repetitively draw observations.
* The control "Record order" displays the order of the record whose distribution will be displayed.
- The first observation is obviously a record, albeit a trivial one. We call it the record of order 0.
- The order 1 record (or "First record") is the first observation whose value is larger than that of the first one.
- The order 2 record (or "Second record") is the first observation whose value is larger than that of the order 2 record.
- More generally, the order n record (or "nth record) is the first observation whose value is larger than that of the order (n - 1) record.
By default, the selected order is 3. This means that we are going to focus our attention on the distribution of the order 3 record. This distribution is displayed in the lower frame.
* Click on "Next". The first observation is drawn. It is the trivial order 0 record, but a record nonetheless, and as such, it is displayed as a red dot.
* Keep clicking on "Next". Every click generates a new observation.
- If this observation is smaller than the last record, it is displayed as a green dot.
- But if this new observation is larger than the last record, its is a new record, and is displayed as a red dot. The previously obtained record is kept on the scene, but is now pink.
After the third record has been drawn, the display is reset and a new sequence of draws may begin.
* Click on "Go". Observations are now drawn automatically, and the lower frame displays the build-up of the histogram of the distribution of the selected order record. We'll show that this distribution is a Gamma distribution.
The vertical scale is arbitrary, and adjusted so that the height of the mode of the Gamma distribution is always the same whatever the selected record order.
* Click on "Reset", and the select the option "Uniform". We now study the distribution of the lower records of the uniform distribution. We calculate this distribution in the Tutorial below.
The beauty of this animation is that it requires cheating, but in an honest way.
Cheating is necessary. For suppose that a record of the exponential distribution (whose order is lower than the selected order) happens by chance to be rather large. Then you might be sitting in front of your computer screen forever (as we did), waiting for this record to be beaten, and you'd then leave this site sooner than we would like you to.
But it can be realized that at any time, a single observation may be drawn :
* That will beat the current record,
* And whose distribution is the same as the distribution of the genuine new record, would we be patient enough to await this new record.
The argument is based on the memoryless property of the exponential distibution and is developped in the Tutorial, but we encourage you to develop it on your own.
Note that we are not claiming that the value of this phony new record is the same as that of the genuine new record, only that its distribution is the same as that of the new record. This distinction is similar to that made when describing the thought-experiment justifying the definition of a sufficient statistic.
* The control "Tolerated failures" displays the allowed number of honest draws before we run out of patience. If the current record has still not been beaten at the end of a "Tolerated failures" long run, then the next observation will be drawn as explained above, and a new record will be found in just one draw. This way, we never have to wait longer than ("Tolerated failures" + 1) to obtain a new record, however large the previous record.
The control may be modified at any time without resetting the animation. Observe that as you increase the value of "Tolerated failures", the accumulation of new records (of the selected order) becomes slower and more irregular.
"Cheating" works with the uniform distribution as well, although the uniform distribution does not enjoy the memoryless property of the exponential distribution. The reason why is explained in the Tutorial.
The exponential distribution plays a central role in the study of record values because it may well be the only distribution whose distributions of records can be calculated directly.
Now consider a random variable X with a probability density. How can the exponential distribution help calculate the distributions of its record values ? The key is that any two distributions with densities can be connected via the probability integral transformation, and we'll show that this remarks allows calculating the distributions of the records of any r.v. with a probability density.
The result is as follows :
* Let X be a r.v. with probability density function f(x).
* Let F(x) be its distribution function.
Then we'll show that the probability density function fn(x) of its nth upper record is given by :
fn(x) = f(x)[-log(1 - F(x))]n / n!
We insisted on the distributions of records because they are the only topics for which interesting results can be obtained with a reasonable amount of effort. But at least two more issues are of interest :
* What is the distribution of the amount of time we have to wait between the order n record and the order (n + 1) record ? Readers familiar with stochastic processes will recognize an inter-arrival time problem.
* Given a time interval [t, t + Δt], what is the distribution of the number of records observed during this period ?
Unfortunately, even basic results about these fascinating questions are beyond the bounds of this Glossary.
In this Tutorial:
* We first derive the distributions of the upper records of any order of the exponential distribution. The result is a direct consequence of the memoryless property of the exponential distribution.
We then show how this result can be used for speeding-up the above animation by allowing the generation of a single observation that beats the current record and whose distribution is identical to that of a new record.
* We then show how the distributions of the records of the exponential distribution can be used for calculating the distributions of the records of any distribution with a probability density. The result is given above.
The key will be to realize that the Probability Integral Transformation establishes a connection between the exponential distribution and any distribution with a probability density.
The demonstration will call on a difficult result about the distribution function of the Gamma distribution that we'll state without proof.
* We then use this result to calculate the distributions of the records of the uniform distribution.
We also show that, eventhough the uniform distribution is not memoryless, a short-cut can be found to speed-up the above animation. This short-cut will be related to the Broken Stick distribution.
DISTRIBUTIONS OF RECORD VALUES
Record values of the exponential distribution
The memoryless property
Distribution of the order n record
Short-cut of the "Exponential" animation
Distribution of the records of an arbitrary random variable
The probability integral transformation
Linking any distribution to the exponential distribution
Distributions of the records of an arbitrary random variable
Records of the standard uniform distribution
Probability distributions of the records
Short-cut of the "Uniform" animation
The "Broken stick" problem
Related readings :