TRAINING COURSE "DECISION AND SEGMENTATION TREES in Data Mining"
Decision Trees are among the most popular Classification techniques in Data Mining. They are fast, they look simple, and more often than not, they produce reasonably accurate results when properly handled. Moreover, their Segmentation capabilities are greatly appreciated, up to the point where they are often called "Segmentation Trees".
Despite these real qualities, Decision Trees suffer from drawbacks that have to be understood before they can be put to use efficiently. In addition, their marked heuristic nature gave rise to a great many variations around a central theme whose subtle differences are not easily perceived by the newcomer.
This 1 day training course (see outline below) will make you
acquainted with the fundamentals of today's most popular Decision Trees.
Course outline
Goals of Decision Trees
Classification
Segmentation
Rule extraction
Identification of pertinent variables
Applications of Decision Trees
Targeting (e.g. mailing), credit scoring, hiring and
salary policies, sales analysis, diagnostics etc...,
Basics of Decision Trees
Impurity of a population before and after splitting
Discriminating power of a variable
Recursive splitting of a data base
Nodes, branches and leaves
Graphical interpretation
How should priors be taken into account ?
How should misclassification costs be taken into account ?
Probabilistic classification
Resubstitution Tree error
Validation, crossed validation
Estimating the performance of a single leaf
Regression with Trees
The various splitting criteria
Dealing with various data types
Categorical variables
Subsets of modalities
Should modalities be bundled ?
Bonferroni adjustment
Ordinal variables
Numerical variables
Automatic or "manuel" Discretisation ?
Binary splits on numerical variables
Combining numerical variables
How to manage missing value ?
Should a Tree growth be stopped ?
Overfitting
The various stopping criteria
On nodes properties
On splits properties
Weaknesses of stopping criteria
Fully grown Trees and pruning
How to interpret a Tree
Rule syntax
Topology stability
Masked variables and forced branching
Real and potential effect of a variable