TRAINING COURSE "DECISION AND SEGMENTATION TREES in Data Mining"
Decision Trees are among the most popular Classification techniques in Data Mining. They are fast, they look simple, and more often than not, they produce reasonably accurate results when properly handled. Moreover, their Segmentation capabilities are greatly appreciated, up to the point where they are often called "Segmentation Trees".
Despite these real qualities, Decision Trees suffer from drawbacks that have to be understood before they can be put to use efficiently. In addition, their marked heuristic nature gave rise to a great many variations around a central theme whose subtle differences are not easily perceived by the newcomer.
This 1 day training course (see outline below) will make you
acquainted with the fundamentals of today's most popular Decision Trees.
Goals of Decision Trees
Identification of pertinent variables
Applications of Decision Trees
Targeting (e.g. mailing), credit scoring, hiring and
salary policies, sales analysis, diagnostics etc...,
Basics of Decision Trees
Impurity of a population before and after splitting
Discriminating power of a variable
Recursive splitting of a data base
Nodes, branches and leaves
How should priors be taken into account ?
How should misclassification costs be taken into account ?
Resubstitution Tree error
Validation, crossed validation
Estimating the performance of a single leaf
Regression with Trees
The various splitting criteria
Dealing with various data types
Subsets of modalities
Should modalities be bundled ?
Automatic or "manuel" Discretisation ?
Binary splits on numerical variables
Combining numerical variables
How to manage missing value ?
Should a Tree growth be stopped ?
The various stopping criteria
On nodes properties
On splits properties
Weaknesses of stopping criteria
Fully grown Trees and pruning
How to interpret a Tree
Masked variables and forced branching
Real and potential effect of a variable