Point estimation
Estimation and Tests are the two main branches of what is known as "inferential Statistics".
-----
Point estimation (or often, just "estimation") is one of the central activities of the statistician. Probability distributions are usually known only through random samples, and estimation is the art of extracting valuable information from a sample about the distribution of probability that generated it.
The term "Estimation" covers several different but closely related viewpoints, that we analyze here.
The above mentioned task may be made simpler by narrowing down the question. This can be done in two ways :
This is called estimating the parameters of a distribution, or simply parameter estimation.
The distinction between these two types of estimation
is moot when the parameters in the mathematical expression of the family
are properties of the distribution, as it is the case
for example of the normal distribution (mean and variance).
Finally, a model, whether predictive or descriptive, may be perceived as a particular type of description of a probability distribution. A parametric model contains parameters, whose values are calculated from the sample. These parameters are therefore random variables that have distributions of their own, and identifying these distributions is an important task in Data Modeling, for analyzing these distributions will tell us how reliable the model is (in a nutshell, models whose parameters have broad distributions are unreliable).
This is called estimating the parameters of a model.
We now review some aspects of parameter estimation.
Let q be a parameter of a distribution, whose true (and unknown) value is q0.
An estimator is a function of observations in the sample (a "statistic"), and whose value will be used as a guess of the true value q0 of the parameter q. The value taken by an estimator on a given sample is called an estimate (of q0).
We will denote by :
So :
q* = q *(sample)
Estimates are expected to be close to the true value q0 .But the sample being random, an estimator is a random variable. So it can never been said with certainty that an estimate is close to the true value of a parameter of the distribution (or of the model).
So it appears that Estimation theory will focus not
on estimates, that are meaningless, but on the properties of estimators considered as
random variables, that is, their probability distributions, or some restricted
aspects thereof (mostly mean and variance).
We will see that a given parameter of a distribution may have several different estimators to choose from. So the two central questions of Estimation theory are :
Nothing in the structure of the equation defining a statistic tells whether it is, or is not, an estimator of something. It can be said that there is no such thing as a definition of an estimator. But a particular statistic will be used to estimate a parameter if it posseses certain desirable properties, that we briefly review here.
A central idea of Statistics is that very large samples are (unless you're terribly unlucky) a reasonably faithful image of the distribution itself. In other words, the empirical distribution function is hoped be pretty close to the real distribution function for large samples. For example, in the case of a continuous distribution, this means that there should be many observations in regions where the probability density is high, and few where the probability density is low.
So you would certainly expect from a "good"
estimator q * of a parameter q that it produces
estimates q* that are closer
and closer to the true value q0 as
you consider larger and larger samples.
Yet, again, because an estimator is a r.v., this
convergence of the estimates towards q0 as
the sample size n grows without limit cannot be expected to happen in a deterministic way.
It can happen only in a probabilistic way, and we will have to be satisfied with the following weaker property :
all you have to do is consider large enough samples for the probability
of the estimate q* to be within a bracket of width e centered
on q0
to be larger than P.
In other words, beyond a certain sample
size, at least P.100 % of the estimates will fall inside this narrow
bracket around q0.
A statistic with such a property is called a consistent estimator of the parameter q.
Consistency is the least you can demand from a statistic to qualify as an estimator.
Except for some rare and pathological exceptions, an estimator's distribution (like that of any any other non trivial statistic) becomes narrower an narrower, and more and more normal-like as larger and larger samples are considered. If we take for granted the fact that the variance of the estimator will tend to 0 as the sample size grows without limit, what consistency really means is that the mean of the estimator's distribution tends to q0 as the sample size grows without limit, as shown in the upper and lower images below :
1) In technical terms, a consistent estimator is a series
of random variables indexed by n (the sample size) that converge in probability to q0.
2)
The Weak Law of Large Numbers is an
example of identification of a consistent estimator.
3) For an exception to the rule of the "narrowing of the
distribution of a statistic for larger and larger samples", see the
Cauchy distribution. We there show that despite the symmetry of the distribution, the
sample mean is not a consistent estimator of the distribution
median (the distribution has no mean). The sample median, though, is a consistent
estimator of the distribution median.
Consistency
is an asymptotic property : defining consistency requires considering arbitrarily large samples. In real life, sample size
will be limited by time or budget constraints. So it is natural to consider
what quality should be expected from an estimator based on samples of a fixed
size n.
Then you would certainly hope the central region of the distribution
of the estimator to be close to the true value q0 of
the parameter. One way of expressing this idea is to consider estimators
whose distribution mean is equal to q0 for any value of n,
the true
value of the parameter q. Such an estimator
is said to be unbiased., and unbiasedness translates into :
E[q]n = q0 for any n
This definition is somewhat arbitrary, as the
mean has no magic virtue other than its mathematical convenience. Any other
measure of central tendency, like the median or the mode, would have provided
other adequate embodiments of the idea of "lack of bias", except
for the fact that further calculations might then prove intractable.
For a given n, an estimator whose expectation is not equal to q0 is said to be biased. For example :
Yet, these two estimators are consistent, as their means tend to the true values (resp. of the variance and the correlation coefficient) as larger and larger samples are considered. So a consistent estimator may be biased for all values of n.
As we will see, bias does not necessarily
make an estimator useless, even for small samples.
A parameter may have several unbiased estimators. For example, given a symmetrical continuous distribution, both :
are unbiased estimators of the distribution mean (when it exists). Which one should we choose ?
Certainly we should choose the estimator that generates
estimates that are closer (in a probabilitic sense) to the true value q0
than estimates generated by the other one. One way to do that is to select
the estimator with the lower variance.
This leads to the definition of the relative efficiency
of two unbiased estimators. Given two unbiased estimators q *1 and
q *2 of the same parameter
q , one defines the efficiency of q *2
with respect to q *1 (for
a given sample size n) as the ratio of their variances :
Relative efficiency (q *2 with respect to q *1)n = Var(q *1)n / Var(q *2)n
One might then wonder if, for a given parameter q, there exists an unbiased estimator which is more efficient than any other unbiased estimator for all sample sizes. The answer is "In general, no.". But it is sometimes possible to identify an unbiased estimator qE with a similar, albeit weaker property :
So qE is indeed the "most efficient estimator", but only in an asymptotic sense. For any sample size n, there might very well be an unbiased estimatorq * more efficient than qE..
Such an estimator is called an efficient estimator.
The question :
* "What
is the smallest possible variance of an unbiased estimator ?"
or equivalently
:
* "Is there a lower bound to the variance
of an unbiased estimator ?"
is both important and difficult.
Should
this Glossary live long enough, it will be addressed in due time.
The practitioner is not particularly keen on unbiasedness. What is really important to him is that, on the average, the estimate q* be close to the true value q 0. So he will tend to favor estimators such that the mean-square error :
E[(q* - q 0)]²
be as low as possible, whether q * is biased or not. Such an estimator is called a minimum mean-square-error estimator.
Given two estimators :
q *2 might prove a better estimator than q *1 in practice (lower image in the illustration below).
As noted above, there is no way to tell a priori if a particular statistic will be useful as an estimator of a certain parameter. Given a statistic, only its behavior in terms of :
will tell if this statistic is worth being
considered for the purpose of estimating the value of this parameter.
So the question of how to construct a statistic that will have
some of the desirable properties of a good estimator remains open. We now briefly
mention three popular methods used for constructing estimators of a given
parameter :
It is the most natural method. Because a large sample is (hoped to be) a faithful image of the unknown distribution D, having this sample is just as good as having the distribution itself. So we proceed as if the empirical distribution function D* was the true distribution. If the parameter q is defined in terms of a function of D :
q = f(D)
then we use :
q*= f(D*)
as an estimate of q.
This is the "common sense" attitude that makes us use the sample mean as an estimate of the mean of the population without thinking twice about it.
This "plug in" method is called the method of moments. In early days of Statistics, the methods of moments was the only one available, and was mostly used for estimating the moments of a distribution (mean, variance and higher order moments), hence its name.
An estimator constructed by the method of moments is clearly consistent (although this requires demonstrating), but often suffers from severe bias for small samples.
Given
a sample and a candidate distribution, the Likelihood is
a measure of "how likely" it is that the sample was generated by that
particular distribution.
Given a family of distributions (usually
summarized by a mathematical expression containing a few numerical parameters),
the Method of Maximum Likelihood (ML) selects that particular distribution
in the family that makes the Likelihood maximum. The values of the parameters
thus obtained are called the Maximum Likelihood estimates of the
parameters of the (unknown) underlying distribution.
The Method of Maximum Likelihood is established on much more solid theoretical grounds than the Method of Moments. In particular, it can be shown that, under very general conditions, a ML estimator is :
ML estimation is the most widely used method of estimation.
The best estimate of the mean of a random variable is the sample mean m, which also has the property of making minimal the sum :
S = Si (xi - a)²
where a is an adjustable parameter. S
is minimal for a = m.
The function y = f(x) is the regression function of y on x is, for any x, f(x) is the mean of the values of y for this value of x. Therefore, regression may be perceived as simultaneously estimating the means of an infinity of random variables, one for each value of x.
Least Squares estimation is an extension
of the above mentioned property of the sample mean. The parameters of a regression
model y = f(x) are usually calculated by imposing
that the sum of the squares of the differences between :
be minimal.
Least Squares is the technique used for calculating the models in Simple and Multiple Linear Regression.
For more on Least Squares estimation, please see here
.
What we described in this page is known as "point estimation", the reason being that the action of estimating produces just a number, the estimate. The weakness of point estimation is that this estimate comes with no clue about how credible it is.
It is sometimes possible to bring about some additional information concerning this credibility. This is the goal of a different kind of estimation known as "interval estimation".
In a nutshell, given a sample, interval estimation builds a segment such that it is possible to calculate the probability for this segment (known as "confidence interval") to contain q0, the true value of the parameter. For a given probability (known as "confidence level"), the shorter the confidence interval, the better the precision with which q0 has been localized.
You may read more about interval estimation here.
____________________________________________________________
In summary, the goal of Estimation is to extract complete or partial information about a probability distribution from a sample generated by this distribution. This information is necessarily probabilistic, and is embodied in an estimate (of a parameter of the distribution). An estimator is a statistic whose properties as a random variable let us expect that its value for the sample at hand (the estimate) is close to the true value of the parameter that is being estimated.
A few general techniques are available for constructing useful estimators (moments, maximum likelihood, least-squares).
It is often possible to associate to a point estimate a confidence interval and a confidence level for this interval.
_______________________________________________________
Related readings :
|
Want to contribute to this site ? |