This technique has two different versions :

1) One for density estimation, and therefore descriptive.

2) The other one for probabilistic classification, and therefore predictive.

In both cases, variables have to be numerical. Each observation can then be represented as a point in a space with as many dimensions as there are variables. The sample may then be thought of as a "cloud" of points.

___________________

In both versions of K_NN, a fixed natural number K is first chosen by the user.

1) K-Nearest Neighbors and density estimation

Let P be any point of the space. One determines the smallest sphere, centered on P, and that contains exactly K observations. Density around P is then very naturally estimated as inversely proportional to the volume of the sphere.

2) K-Nearest Neighbors and probabilistic classification

Every observation belongs now to of one class among N.

As before, one determines the smallest sphere centered on P that contains exactly K observations. P is then said to belong to the class that has the most representatives within the sphere.

One can even do a little better than that. Probabilities for P to belong to class i may be estimated as the ratio of :

• The number of observations in the sphere that belong to class i
• To the total number of observations in the sphere.

Note that this approach to estimating class probabilities is similar to that used in leafs of Decision Trees.

_______________________

K-Nearest Neighbors is clearly non parametric.

In both versions of the technique, K has to be chosen by the user.

• If K is too large, spheres are too big,  and the detailed features of the various density functions are smoothed out (high bias).
• If K is too small, the uncertainty about the "true" size of the sphere that encompasses the K observations is large. The model becomes quite uncertain and unstable from sample to sample (high variance).

Note the behavior of the model when K varies from "too large" to "too small" is quite similar to that of histograms. Of course, the reasons are the same. And of course, there is no "magic formula" to help the user determine the "best" value for K.

Kohonen maps

Kohonen maps are virtually the only unsupervised Neural networks available in commercial software.

Kohonen maps do 2 things simultaneously : dimensionality reduction and clustering.

1) Dimensionality reduction.
In loose terms (that genuine statisticians generally oppose to), Kohonen maps may be perceived as non linear extensions of Principal Components Analysis (PCA) because of their capacity to stretch and twist so as to "hug" data points as closely as possible.

2) Clustering.
At first, the quantization properties of Kohonen maps may be considered a nuisance. It took some years to realize that they could also turn Kohonen maps into clustering tools with quite remarkable properties shared by no other clustering technique : they produce clusters with a natural "topology". What that means is that each cluster has natural "neighbor clusters" whose observations are quite similar to his own. This unique property makes Kohonen maps extremely valuable clustering tools.