Quantiles

Also called "fractile".

# Distributions

Let p(x) be a continuous probability distribution. The area under the curve representing  p(x) is equal to 1.

Let a be any number between 0 and 1 :

0 < a < 1

There is a unique point xa such that the area under p(x) between - and a is equal to a. This point is the quantile of order a of the distribution.

Quantiles are often expressed as percentages (%), and xa will then be called the 100.a% quantile de la distribution.

• If a = 25%, x0.25 is called the first quartile of the distribution.
• a = 50%  defines the second quartile (the median), and a = 75% defines the third quartile.

If F(x) is the distribution function of p(x), then a = F(xa), and (lower image of the above illustration) :

xa = F-1(a)

and xa  is then defined as the number such that Pr{x xa}.

Some authors define xa by Pr{x > xa}. This ambiguity is not a problem as the context usually makes it clear which definition is used.

-----

For most probability distributions, quantiles cannot be expressed mathematically in a closed form. Therefore, they have to be calculated once and for all using approximations (that are quite sufficient for most applications). The values are then "tabulated", and displayed in table (or in software). It is then possible for the analyst or for software to read the value of the quantile for a given percentage a.

-----

Quantiles are necessary for interval estimation and for tests. Both rely on the possibility to establish that the probability for the value of a certain statistic to be outside a certain range is less than a given probablity α. The limits of the range are directly related to the quantiles of the distribution of this statistic.

For example, let m be the mean of a sample drawn from a normal ditribution whose variance σ² is known, but not its mean µ. The mean µ is estimated by m. A confidence interval with confidence level (1- α) is given by (see here) :

where zα/2 denotes the (1 - α/2) quantile of the standard normal distribution.

# Sample

Quantiles may also be defined for samples (or any finite set of points) :

• For a given number a (0 < a < 1), the quantile of order a of the sample is the observation defined as in the illustration below. Because of the "staircase" shape of the empirical distribution function (with all steps 1/n high), a range of values of a will designate the same observation.
If a = k/n, two adjacent observations respond to this definition, and one will arbitrarily choose the larger observation.

• For the same reason, an observation defines a range of values of a of which the smallest value will be retained as the definition of the quantile associated to the observations.
Any abscissa between two observations leads to the same value of a (lower image of the above illustration). Whenever it is necessary to define quantiles unambiguously (as in Q-Q plots), one will interpolate between quantiles defined by two consecutive observations. For example, if there are 4 observations labeled from 1 to 4 from the left, then the 50% quantile (median) will be defined as the middle point between observations 2 and 3.

# Q-Q plots

Quantiles are helpful for graphical representations of samples, and more specifically :

• For pitting a sample against a reference theoretical distribution (usually normal), for what might be perceived as a "visual goodness-of-fit test".
• For pitting two samples against each other, for what might be perceived as a "visual identity test".

These questions are developed below.

_____________________________________________________________

 Tutorial 1

A "Quantile-Quantile plot", or "Q-Q plot" is a visualization technique that allows comparing :

• A sample and a reference probability distribution (usually a normal distribution), for the purpose of assessing how likely it is that the sample was generated by the distribution, and visually analyze the reasons that may have led to rejecting this hypothesis. In loose terms, a QQ plot may therefore be perceived as "visual normality test".
• Or two samples, for the purpose of deciding how likely it is that these samples were generated by the same distribution (that one does not try to identify). Again, visual inspection of the plot should provide clues as to why this hypothesis is rejected.

In both applications, the analyst's judgement is essential. Conclusions drawn from a QQ plot are therefore necessarily subjective. Only tests can reject the normality hypothesis about a distribution known through a sample, or the identity hypothesis about the two distributions behind two samples. But tests do not provide any detailed insight about the sample(s) as QQ plots do.

QQ-PLOTS

 Sample vs a reference distribution Standardizing the data Why quantiles ? The QQ plot Systematic departure from the ideal configuration Rankit plot Sample vs Sample TUTORIAL

_________________________________________________________________

 Tutorial 2

Quantile Inter-quartile (QIQ) transformations can be used to isolate the shape of an empirical distribution relative to any theoretical distribution that is not completely specified (such as a Normal distribution with an unknown mean µ and/or standard deviation σ.).

In addition QIQ transformations can also be used to classify tails of an empirical distribution (long, medium, or short) so as to help guide a choice of theoretical distributions to consider.  Prior to understanding the meaning of a QIQ transformations it is first important to understand two lesser known statistical concepts that the QIQ methods are based off of :  the mid-distribution function and the continuous quantile function.

THE QUANTILE INTER-QUARTILE TRANSFORMATION

 The mid-distribution function The continuous quantile function The Quantile Inter-quartile (QIQ) transformation TUTORIAL

 We thank Mr. Matthew BATES for this Tutorial

______________________________________________________________

 Case study

The purpose of this example is to demonstrate both the importance and use of modeling data with a proper selection of common statistical distributions.  Some major topics used to accomplish this end are:

• Describing samples using quantile inter-quartile methods (QIQ transformations)
• Parametric probability density estimation
• Simulation of distributions of differences between two random variables whose distributions are known and that are assumed to be independent.

The following case study is based on a real industrial problem : a car manufacturer is faced with the problem of fitting parts that were machined independently. The distributions of the dimensions of the parts will have to be estimated, and the adjustment of the machine-tools will have to be specified so as to minimize the probability of misfit between two parts that need to be assembled.

The probability density functions will be estimated by the empirical quantile methods to determine the form of the distribution (i.e. such as Normal, Exponential, etc.).  We will in particular demonstrate how to choose a model without relying on estimation of regular parameters (location and scale) pertaining to a given theoretical distribution.

QUANTILE MODELING FOR

IMPROVED PROCESS CONTROL

 The problem The data The solution Step 1  : Identification of the distributions with QIQ-plots Step 2 :  Estimating the parameters of the distributions Normal distribution Exponential distribution Estimating the mean Estimating the lower bound Step 3 : Simulating the distributions CASE STUDY

 We thank Mr. Matthew BATES for this Case Study

________________________________________________________

 Distribution function