|
JUSTIFICATION OF THE DEFINITION OF A "SUFFICIENT STATISTIC" |
The definition of a sufficient statistic is often perceived as abstract and telling little to intuition. We describe here a "thought experiment" that is intended to make this important concept easier to grasp.
Let p(x; θ) be a probability distribution that is known except for the value of one its parameters θ.
Let also x = {x1, x2, ..., xn} be a n-sample drawn from this distribution.
Finally, let T be a statistic, and t the value of this statistic on the sample. We denote gθ(t) the distribution of T. Whether gθ(t) is known or not is of no importance for the remainder of this section.
-----
We have two statisticians, call them S1 and S2. Both know p(x; θ), except for the value of θ.
Statistician S1 thinks :
"I'm in a good position because I have the detailed description of x, which contains all the information one will ever get about the value of θ."
Statistician S2 thinks :
"I'm at a disadvantage because the information I received is only a small part of the information contained is the sample. In particular, I do not know the values of the observations, so I will never know the value of any statistic other than T (or a function of T) on the sample. So, whatever the question will be asked about θ, there is nothing I can do except hope that T = t has something do with this question.
My colleague S1 is, of course, in a much better postion because he can do all sorts of calculations on the sample and therefore will very likely be able to provide a better answer to the question than I can".
-----
In general, all of the above is true, but there is a special and very important circumstance where statistician S2 is in just as good a position as his apparently luckier colleague S1.
S2 knows the analytical form of p(x; θ), just as S1 does (he just doesn't know the sample x). So he can calculate the theoretical distribution of the sample conditionally to the value of the statistic T :
Lθ(X | T = t )
Usually, this distribution will depend on θ (this is why we indexed L with θ), which is unknown, so S2 cannot do anything practical with this theoretical distribution.
But suppose that S2 discovers, to his surprise, that θ does not show up in the mathematical expression of Lθ(X | T = t), that is, that Lθ(X | T = t ) does not, in fact, depend on θ, and can therefore just be written L(X | T), with the "θ" index removed.
L(X | T = t ) is then a completely defined distribution for any value t
and S2 can use it as he wishes.
Then we can imagine the following thought experiment :
x is a realization of a random vector X, and y is a realization of another random vector Y.
We'll show that :
|
X and Y have identical probability distributions. |
Actually, this result is true whether the statistic T
is sufficient or not for θ. The sufficiency of T plays no role in the demonstration
: it is there only to ensure that statistician S2 can use
the conditional distribution L(X | T = t ).
Now let θ* be an estimator of θ. The two values :
* θ*(x) and
* θ*(y)
are different, but they follow the same probability distribution and are therefore, on the average, equally good (or poor!) estimates of θ.
-----
If we apply this scheme to our thought experiment :
* S1 draws sample x directly from L(X; θ) by drawing n observations from p(x; θ).
* S2 :
- First "draws" a realization of T from gθ(t). In fact, he doesn't actually do it : he just uses the value t given to him by S1. Note that S1 does not have to know gθ(t) because drawing a sample x and then calculating t is just the same as drawing a realization of T from gθ(t).
- Then S2 draws a sample y from L(X | T = t).

In summary, just because we have been able to identify a statistic T such that the distribution of the sample conditionally to the value t of this statistic does not depend on the value of the parameter θ, we can dispense with knowing the details of the sample, and do as good a job as a statistician using only the value t of the statistic T.
Such a statistic, when it exists, is called a sufficient statistic for θ. The term "sufficient" here means : "Don't bother to give me the complete description of the sample, it is sufficient for me to know the value t of T on the sample to do as good a job as if I knew the sample itself".
________________________________________________________
|
GEOMETRIC INTERPRETATION OF A SUFFICIENT STATISTIC |
The concept of "sufficient statistic"
has a geometric interpretation, that we illustrate with the normal distribution
N(µ, σ²). We assume that σ² is
fixed. The parameter θ of the foregoing paragraph
is therefore the mean µ.
We consider 2-samples, whose distribution in the (x1, x2) plane is binormal with circular symmetry Lµ(x1, x2). We index L with µ as a reminder that this distribution depends on the value of µ.
The above illustration represents Lµ(x1, x2) for µ = 0.
It can be shown that the sample mean m = (x1+ x2) /2 is a sufficient statistic for µ. For a given number t, the samples satisfying m = t are on a line at 45° of the axes (blue line). The conditional distribution Lµ=0(x1, x2 | m = t ) is the distribution encountered while moving along this blue line. Its values are the values of Lµ=0(x1, x2) on the line, normalized so that the integral of these values is 1.
This conditional distribution is normal.
Now consider another value for µ. The distribution
Lµ(x1, x2) is different
(lower image of the above illustration). The expression m
= t is materialized by another blue line just above
the first one. Because m is a sufficient statistic, the distributions
on these two blue lines are identical (but of course, this classical result
can also be demontrated directly).