A REALISTIC EXAMPLE OF WEIGHTED SIMPLE LINEAR REGRESSION
WHERE VAR(Y) IS INDEED PROPORTIONAL TO X
Let T be a task to be executed repetitively.
For example, T might be "Driving 1 mile on a given type of
road", or "Hammering down a nail into a block of wood".
The
nominal duration of the task is d minutes, but because of minor
uncontrolable variations of the conditions of execution, the completion time
of T is in fact a random variable t of variance
² (and
mean d).
Let T be executed n times in a row. We assume
that the duration of the ith completion is independent
from the duration of any other completion. How long does it take to
complete the series of n executions ?
Let
tn be the time needed for completing the n executions
of the task. For a given n, tn is a random variable,
sum of n independent random variables all identical to t.
Because the expected value of a sum of independent random
variables is just the sum of the expected values of each variable, the expected
value of tn is :
E(tn) = E(t) + ... + E(t) =
d + ... + d = n.d
Because the variance of a sum of independent random variables is just the sum of the variances of each variable, the variance of tn is :
var(tn) = var(t) + ... + var(t) =
n.var(t) =
²
+ ... +
²
= n.
²
So it appears that the variance of the duration of
n completions of T is proportional to n.
Suppose
now that we have a table of measurements of tn for
various values of n. We want to estimate d, the unknown nominal
completion time of T.
We are convinced that if it were not for the
random nature of t, we would have tn = n.d,
that is that the relationship between the independent variable n and
the response variable t is indeed linear. Simple Linear Regression therefore
appears to be the adequate tool for estimating d as the slope of the Least
Squares Line. But because we just showed that the variance of t is proportional
to n, we should use the Weighted Least Squares approach instead
to estimate d.
|
|