TRAINING COURSE "DECISION AND SEGMENTATION TREES in Data Mining"

Decision Trees are among the most popular Classification techniques in Data Mining. They are fast, they look simple, and more often than not, they produce reasonably accurate results when properly handled. Moreover, their Segmentation capabilities are greatly appreciated, up to the point where they are often called "Segmentation Trees".

Despite these real qualities, Decision Trees suffer from drawbacks that have to be understood before they can be put to use efficiently. In addition, their marked heuristic nature gave rise to a great many variations around a central theme whose subtle differences are not easily perceived by the newcomer.

This 1 day training course (see outline below) will make you acquainted with the fundamentals of today's most popular Decision Trees.

Course outline

Goals of Decision Trees

Classification

Segmentation

Rule extraction

Identification of pertinent variables

Applications of Decision Trees

Targeting (e.g. mailing), credit scoring, hiring and salary policies, sales analysis, diagnostics etc...,

Basics of Decision Trees

Impurity of a population before and after splitting

Discriminating power of a variable

Recursive splitting of a data base

Nodes, branches and leaves

Graphical interpretation

How should priors be taken into account ?

How should misclassification costs be taken into account ?

Probabilistic classification

Resubstitution Tree error

Validation, crossed validation

Estimating the performance of a single leaf

Regression with Trees

The various splitting criteria

• Khi-2 (CHAID Trees)
• Entropy (C5 Trees)
• Gini index (C&RT)
• "Twoing"
• QUEST

Dealing with various data types

Categorical variables

Subsets of modalities

Should modalities be bundled ?

Ordinal variables

Numerical variables

Automatic or "manuel" Discretisation ?

Binary splits on numerical variables

Combining numerical variables

How to manage missing value ?

Should a Tree growth be stopped ?

Overfitting

The various stopping criteria

On nodes properties

On splits properties

Weaknesses of stopping criteria

Fully grown Trees and pruning

How to interpret a Tree

Rule syntax

Topology stability