Interactive animation

Standardization

Imagine that a bank's data base contains two fields (among many others):

    1) Customer's average monthly balance.
    2) Value of the dream house that he wants to borrow money to buy.
 

 Both fields are expressed in dollars, but chances are that numbers in the second field will be considerably larger that numbers in the first field.
 

It is then generally considered that the model will strongly take the second field into account, and tend to neglect the the first one even if it has more predicitve power. This will show when an a posteriori analysis of the importance of individual variables will be conducted.

In order to bring all (numerical) variables on the same footing, it is customary to apply a transformation to each numerical variable before building the model. After transformation, the new variables will all have a mean value equal to "0", and a variance equal to "1". These variables are said to have been standardized (or sometimes, and improperly, "normalized").

The transformation is linear, and defined by :

for each variable xi.

 

Note that each variable is standardized independently of the other variables : standardization is not a multivariate transformation. A more complex transformation can make not only every variable standard, but also the covariance matrix of the distribution become the identity matrix In (which ordinary standardization doesn't do). The distribution is then said to have been sphericized.

 

The illustration below shows the the effect of standardization on a variable distribution : mean value is shifted to "0", and the distribution is squeezed (or expanded) so as to have unit variance. 

                                             

 

The distribution that generated the sample is usually unknown. The analyst will therefore have to be statisfied with standardizing the sample, as illustrated below.

__________________________________________________________

 

The following interactive animation illustrates the standardization of a sample. You need FlashPlayer to view it. If you don't have it, you can download it for free at www.macromedia.com/downloads/ .

 

 

 

The upper sample (red points) is the original, non standardized sample. It's mean is marked by a vertical blue line.

The lower sample (bleu points) is the standardized sample :

    * Its mean is always 0,

    * Its Standard Deviation is always 1.

 

Move red points about with your mouse, and observe the corresponding changes of the standardized sample.


No scaled is mentioned for the original sample, as any scale will conduct to the same standardized sample.

____________________________________________________________

 

Related readings

Standard Deviation

Variance

Download this Glossary

 

Want to contribute to this site ?