A REALISTIC EXAMPLE OF WEIGHTED SIMPLE LINEAR REGRESSION

WHERE VAR(Y) IS INDEED PROPORTIONAL TO X

 


Let T be a task to be executed repetitively. For example, T might be "Driving 1 mile on a given type of road", or "Hammering down a nail into a block of wood".
The nominal duration of the task is d minutes, but because of minor uncontrolable variations of the conditions of execution, the completion time of T is in fact a random variable t of variance ²  (and mean d).
Let T be executed n times in a row. We assume that the duration of the ith completion is independent from the duration of any other completion. How long does it take to complete the series of n executions ?

Let tn be the time needed for completing the n executions of the task. For a given n, tn is a random variable, sum of n independent random variables all identical to t.
   Because the expected value of a sum of independent random variables is just the sum of the expected values of each variable, the expected value of tn is :

E(tn) = E(t) + ... + E(t)  = d + ... + d =  n.d

Because the variance of a sum of independent random variables is just the sum of the variances of each variable, the variance of tn is :

var(tn) = var(t) + ... + var(t) = n.var(t) = ² + ... + ² = n.²

So it appears that the variance of the duration of n completions of T is proportional to n.

Suppose now that we have a table of measurements of tn for various values of n. We want to estimate d, the unknown nominal completion time of T.
We are convinced that if it were not for the random nature of t, we would have tn = n.d, that is that the relationship between the independent variable n and the response variable t is indeed linear. Simple Linear Regression therefore appears to be the adequate tool for estimating d as the slope of the Least Squares Line. But because we just showed that the variance of t is proportional to n, we should use the Weighted Least Squares approach instead to estimate d.