Associations (analysis)

A large retail store sells mineral water. It also sells milk. At the counter, some trolleys will contain mineral water, but no milk. Some others will contain milk but no mineral water. Some will contain both (and, of course, most will contain none of these two items).

Now if we assume that buying mineral water bears absolutely no relation whatsoever with buying milk, it is possible to predict just how many trolleys should contain both milk and mineral water (see here).

If the observed number of trolleys containing both of the above items exceeds this prediction, then a (positive) association has been discovered.

 

This discovery is of great value. It may tell a lot about :

 

Associations Analysis may also be used to identify differences between several stores of a same brand. The fact that various stores detect different associations may be quite revealing :


 Conducting Associations Analysis in time sheds light on changing consumers' habits. Suppose an association is weak, but that it's strength keeps increasing over the months, then extrapolating the trend may create excellent conditions for meeting the full-blown need when it becomes "mature". Conversely, a deacreasing association will prevent pushing products whose future is perhaps doomed.

 

Although retail is still the main user of Associations Analysis, it is beginning to spread into other areas like :

 

Now, with all these very positive (and real) capabilities, why isn't Associations Analysis everywhere ? The reason is that, despite its great simplicity, it runs into several operational snags.

1) The calculations, although simple, are heavy, and can exceed the capabilities of even powerful computers. This due to the large number of combinations of items that need to be taken into account (see here). Cumbersome methodologies have been developped to keep computer time within the bounds of reason, but they require a lot of human expertise, and can conceal an otherwise interesting association.
 

2) Associations Analysis is plagued by the laborious discovery of "trivial" associations. What is the value of discovering (after many computer hours) that people who buy washing-machines usually also buy an "Extended Warranty Contract" ? Any junior salesman knows that. But computers don't, and they crunch numbers without regard for the possible lack of interest of the outcome, unless tedious care has been taken to prevent them from wasting their time on obvious associations.
 

3) The results of Associations Analysis may be biased by previous promotional campaigns. Rather than detecting a genuine trend, it will then rather measure the efficiency of the campaign.
 

4) Occasional, non recurrent causes (weather conditions, temporary shortage of products, strong offensive of a local competitor...) may also bias the analysis. The team conducting the analysis may very well not be aware of these exceptionnal conditions.
 

5) It is often the case that an unquestionable association resists interpretation. Maybe "What you don't know can't hurt you", but is it still true of "What you know, but don't know what to do with" ? The temptation will be there to force an interpretation on the association. If the interpretation is wrong, the decisions made on the basis of this interpretation will hurt more than they will help.

 

So, Associations Analysis follows the basic philosophy of Data Mining : your business data is not random, and contains information worth extracting. But even the simplest techniques (from an algorithmic point of view) cannot dispense with full human participation to the elaboration of the model.