Fisher's F distribution
Also known as the Fisher-Snedecor distribution.
Given two samples drawn from two independent normal distributions, the question is :
Are the variances of these two normal distributions equal ?
1) The sample sizes may be different..
2) The means of the populations play no role in what follows.
If the variances of the two normal distributions are indeed equal, one expects the variances of the two samples to be approximately equal as well. On the other hand, if the two samples have vastly different variances (lower image of the above illustration), one would suspect that the two mother normal distributions also have different variances.
This line of thinking clearly leads to a test based on the comparison of the two sample variances. All we have to do is to identify an appropriate test statistic.
Let the two normal distributions be N(µ1, ²1) and N(µ2, ²2 ) (again, the means are irrelevant).
We want to test :
* The null hypothesis H0 : ²1 = ²2
* Against the alternative hypothesis H1 : ²1 ²2 .
We tentatively consider the ratio of the two sample variances as the test statistic F. Recall that for any distribution the quantity S ² defined by :
S ² = 1/(n - 1).i(xi - )²
is called the "corrected sample variance", and is an unbiased estimator of the distribution variance.
So, with obvious notations, we now consider the quantity :
F = S1² /S2²
which is an attractive candidate as a test statistic : we'd naturally tend to reject H0 if the value of F is just too different from 1.
* n the size of the first sample,
* m the size of the second sample.
We know that :
* (n - 1)S1²/1² ~n - 1
* (m - 1)S2²/2² ~m - 1
so that, under the null hypothesis ²1 = ²2, the quantity :
F = S1² /S2² is distributed as [n-1 /(n-1)] / [m-1 /(m-1)]
which does not depend on the common variance ² of the two normal distributions, and can therefore be used as a test statistic.
It happens that the distribution of F can be calculated explicitely, and is known as Fisher's F distribution Fn - 1, m - 1. It depends on two indices, that are known as its two degrees of freedom.
You'll find below an interactive animation that displays the various shapes of the F distribution depending on the values of its degrees of freedom.
It is easy to show that for a given quantile , one has :
F() n, m = F(1 - ) m, n
This symmetry is used to turn a naturally two-sided test into a more convenient one-sided test. In practice, one always considers the ordering of the two samples that leads to a value of F larger than 1.
For a given significance level , the null hypothesis H0 : ²1 = ²2 is then rejected if the observed F is larger than F()n - 1, m - 1, which is the left limit of the yellow area in this illustration :
The following interactive animation illustrates Fisher's distribution.
Displays two standard normal distributions, together with two samples generated by these distributions, and with respective sizes n and m. You may change the sample sizes with the "Sample" buttons.
The estimated variances of the distributions are displayed below their respective frames.
Displays the Fn-1, m-1 distribution curve. Note that the degrees of freedom ("df1" and "df2") are respectively (n -1) and (m -1).
The green line hanging from the top of the frame is the mean of the distribution.
The position and height of the mode are also displayed.
The value of the Fisher statistic (ratio of the estimated variances) is displayed at the bottom of the frame.
Change the size of each sample and observe the changes of the F curve.
* As the ratio of two positive numbers, F is defined only for positive values.
* F extends to infinity: nothing prevents the ratio of the two variance estimates to be as large as desired, although with smaller and smaller probabilities for larger and larger values of the ratio.
* For n 4 (df1 3), all curves have the same general one-sided asymmetrical-bell shape. They all start at the origin.
* For n = 3 (df1 = 2), we observe a drastic change: the curves are now monotonically decreasing for all values of m. The value of the intercept is 1 for all m.
* For n = 2 (df1 = 1), the curves are still monotonically decreasing for all values of m, but the vertical axis is now an asymptote.
* When the mode exists, it is always smaller than 1.
* Make n progressively larger: the mode always moves to the right, but is always to the left of 1. For any given m, the height of the mode first decreases, then increases again.
* For a given m, the curve become narrower for larger values of n.
* As m gets larger (second sample), the mode moves to the right, but is always to the left of 1. The mode gets higher too (for a fixed n), and the curve gets narrower.
* We show below that as n and m grow simultaneously without limit, the mode tends to 1 from below.
* The mean is always larger than 1.
* It does not depend on n. It varies only with m.
* It tends to 1 from above as m +.
* The mean gets larger very fast when m gets smaller. In fact, for m = 3 and m = 2 (df2 = 2 et 1), the F distribution has no mean for any value of n (see below). So we have here two examples of distributions with no mean (the most classical example of a distribution with no mean being the Cauchy distribution).
Mode and mean converge to the same value (1) as n and m grow without limit, and consequently the F curve departs then less and less from symmetry.
Choose n and m, the click on "Go" and observe the build-up of the histogram of the corresponding F distribution.
We'll show that the probability density function of Fisher's F distribution is :
We'll show that the mean of the F distribution, when it exists, is :
Notice that :
1) The mean does not exist for m = 1 and m = 2.
2) When the mean exists, its value depends only on the second degree of freedom (denominator), and not at all on the first one (n at the numerator).
3) The mean tends to 1 from above as m grows without limit.
We'll show that, in the general case, the F distribution has a single mode positioned at :
Notice that :
1) The mode is always smaller than 1, but clearly converges to 1 from below as both m and n grow without limit.
2) The general calculation does not apply to the case n = 1. We'll show that the vertical axis is then an asymptote for the F distribution.
3) The mode is 0 for n = 2. The F distribution is then monotonically decreasing, and we'll show that it intersects the vertical axis at height 1 whatever the value of m.
4) The mode is always smaller than the mean (which is always larger than 1), as should be expected from a right-skewed distribution.
We'll show that the variance of the F distribution, when it exists, is :
Notice that the variance does not exist for m = 1, 2, 3, 4.
We'll show that :
The inverse of a Fn ,m distributed r.v. is Fm , n distributed.
We show here that a squared t-distributed r.v. with m degrees of freedom has a F1, m distribution.
If T ~ tm then T ² ~ F1, m
We may now give a more formal definition of the Fn ,m distribution without making reference to the problem that led to its identification.
* X ~n
* Y ~m
* X and Y independant.
This formal definition is useful because several important quantities that are not related to estimated variances will appear to be the ratio of two independent variables divided by their own number of degrees of freedom. This is the case of :
* The F statistic of ANOVA.
* The F statistic of the global test of validity of a Simple or Multiple Linear Regression.
* The squared Mahalanobis distance between the means of two multinormal distributions.
Fisher's F distribution is related to the Beta distribution as follows.
Let F be a r.v. distributed as Fisher's Fn ,m . We show here that :
In this Tutorial, we establish some basic properties of Fisher's F distribution.
Probability density function
* We remind you that we established here this density function by calling on the general results about the distribution of the ratio of two random variables.
* But in this Tutorial, we use a less popular although just as effective and convenient approach. We first calculate the cumulative distribution function of the F distribution which will appear in an integral form that we then differentiate to obtain the probability density function.
We use this same approach here for calculating the pdf of Student's t distribution.
Calculating the mean and variance of Fisher's F distribution by the direct method is quite feasible, but is a little bit intricate so we defer this approach until the next Tutorial.
In this Tutorial, we use an indirect but simpler method. We consider a moment of a F distributed r.v. as the expectation of a function of two Chi-square r.v., and then call on the LOTUS theorem in its 2-variable version to calculate this expectation. This will allow us to derive the first and second order moments of the F distribution quite easily, and then derive its variance from these two results.
Inverse of a F random variable
We finally show that the inverse of a Fn ,m distributed r.v. is Fm ,n distributed.
PROPERTIES OF FISHER'S F DISTRIBUTION
Probability density function of Fisher's F distribution
Joint probability density function
Cumulative distribution function
Probability density function
Re-introducing the degrees of freedom
n = 1
First moments of Fisher's F distribution
Second order moment
Distribution of the inverse of a Fisher's random variable
We now go again over the issue of calculating the mean and variance of Fisher's F distribution and describe a second method quite different from the first one used in the previous Tutorial. The reader is warned that the content of this Tutorial is exclusively mathematical, and brings about no further understanding of the F distribution from a statistical viewpoint. Yet, the demonstration is interesting for its own sake, and deserves to be presented.
The basic idea is as follows : the numerator of Fisher's F distribution pdf is a power of x. Thus, the pdf and all moments have very similar mathematical structures. As it turns out, all these quantities belong to a certain family of integrals that will be shown to satisfy two distinct recursion relations that we will combine to establish a link between a moment and a higher order moment. The condition of normalization of the pdf will then lead us to the expression for the mean, from which we'll easily derive the expression for the second order moment of the distribution, and thus for the variance.
If needed, the method can be used for calculating higher order moments.
MEAN AND VARIANCE OF FISHER'S F DISTRIBUTION
Integral I(α, β) defining the moments
First recursion relation
Second recursion relation
The final recursion relation
Normalization of the pdf
Second order moment
Related readings :