ANOVA
Acronym for "ANalysis Of VAriance".
One of the fundamental tests of Statistics.
-----
This illustration shows three groups of observations that were generated by three probability distributions that are known :

Nothing is known about the means of these probability distributions : they may, or may not be equal.
ANOVA is a test whose goal is to assess the plausibility of the hypothesis stating that the means of these normal distributions are indeed equal.
-----
More generally, we have k groups of observations known to have been generated by k independent normal distributions with identical variances and respective means µ1, µ2, ..., µk. The groups need not have the same size.
ANOVA will test :
The analyst chooses a significance level a (typically, 0.05 or 0.01). ANOVA produces a p-value.
So ANOVA may be perceived as a generalization of Student's t test to more than two groups.
-----
ANOVA is used in the hope of showing that there is a difference between distribution means. For example, several groups of patients, all suffering from high bood pressure, are submitted to several new treatments (one treatment for each group). These treatments are expected to have different efficacies, and it is hoped that some of them will turn out to be particularly effective.
After the treatments, ANOVA will be used on blood pressure measurements, and it is hoped that it will reject the hypothesis that all treatments are equally effective (or ineffective).
ANOVA is a global test. If the hypothesis that all means are equal is rejected, ANOVA produces no detailed explanation as to why it rejected it. But the above example shows that it would be very informative to identify the group(s) responsible for this rejection.
This is a difficult question.
Statistics has developped several tests meant to be used after ANOVA has rejected the "equal means" hypothesis. Their purpose is to analyze the reasons why the hypothesis was rejected. These tests are globally refered to as "post hoc" or "a posteriori" tests.
For example, the goal of Dunnett's test is to identify groups whose means are significantly different from the mean of a single reference group (typically, a "placebo" group).
ANOVA relies on restrictive assumptions (normality and homogeneity of the variances of the distributions) that may sometimes be deemed unrealistic. Before running an ANOVA, these assumptions are therefore to be checked, or better, tested with :
If data appears to be incompatible with the ANOVA assumptions, it will still be possible to test the hypothesis of equality of the means by resorting to a non parametric test : the Kruskal-Wallis test.
Recall that "ANOVA" means "Analysis of Variance". It may be surprising to find the term "variance" in the name of a test that bears on means, not variances. The reason is as follows : if the means of the distributions are different, then the variance of the "mega-group" that pools all the observations together irrespective of their origin will very probably be larger than the common variance of the distributions, as estimated from the groups. It then turns out that means are indirectly compared through comparing variances.
ANOVA is used on "native" samples, as in the example described above. But it is also met in other contexts :
One of the classical approaches to Classification is to identify directions such that the projections of the classes on these directions are "as widely separated as possible". But at the heart of ANOVA is Fisher's F statistic (see tutorials below), that can be perceived as a measure of the separation of samples generated by normal distributions with equal variances. It is therefore quite natural that the significance of a discriminant direction found by FDA be assessed by an ANOVA.
The
significance test of a Linear Regression (see for example here)
is summarized by a so-called "ANOVA table". The reason is that this
test relies on the principle of "variance decomposition" of the response
variable that follows the same mathematical path than ANOVA's as described
on this page, although there are now no "groups".
___________________________________________
|
Tutorial 1 |
In this first Tutorial, we use an example to descibe the basic but fundamental principle of univariate ANOVA.
AN OVERVIEW OF ONE-WAY ANOVA
|
Reminder : the goal of ANOVA The principle of ANOVA The null hypothesis is true The null hypothesis is false The test |
||
|
TUTORIAL |
||
_______________________________________
|
Tutorial 2 |
We then describe the procedure known as "variance decomposition". It is similar to that found in the validity test of Linear Regression. It is purely geometric, with no reference to probabilities.
VARIANCE DECOMPOSITION
|
Notations Variance decomposition Total Sum of Squares (SST ) Decomposition of SST Factorial Sum of Squares (SSF ) Residual Sum of Squares (SSR ) The "variance decomposition" equation |
||
|
TUTORIAL |
||
_________________________________________
|
Tutorial 3 |
The various Sums of Squares are random variables. We study their distributions and their properties as estimators of s². Unfortunately, the crucial point of ANOVA is the distribution of the Factorial Sum of Squares, whose demonstration is beyond this Glossary, so we state the result without proof.
DISTRIBUTIONS OF THE SUMS OF SQUARES
|
Total Sum of Squares Distribution Estimation of s² Residual Sum of Squares Distribution Estimation of s² A premature attempt Factorial Sum of Squares (no demonstration) Distribution Estimation of s² |
||
|
TUTORIAL |
||
__________________________________________
|
Tutorial 4 |
We finally describe the ANOVA's statistic. It turns out to be a Fisher's r.v., and ANOVA will turn out to be a classical F test. The results are presented in the ubiquitous "ANOVA table".
ANOVA'S F TEST
|
The ANOVA statistic Fisher's F statistic Mean Squares The F test ANOVA table |
||
|
TUTORIAL |
||
____________________________________________
Related readings
|
Want to contribute to this site ? |