RBF networks

What are RBF networks ?

RBF is an acronym meaning "Radial Basis Functions".

One of the two main supervised Neural Networks (the other one is the Multilayer Perceptron).

RBF networks rely on the fact that any function (that is continuous on a compact domain) can be approximated as closely as one wishes by a sum of appropriately chosen gaussian functions. If these gaussians are constrained to all have the same height, then replace "sum" by "linear combination", and the property is still true.

The above illustration represents a function y = f(x) together with the gaussians that sum up to f(x).

This property is true in any dimension. The unidimensional gaussians are then replaced by multidimensional gaussians G(x1, x2 , ..., xn).

-----

The primary goal of RBF networks is therefore regression, that is, to reconstruct a good approximation of a function that is known only through a finite number of noisy data points {xi, yi} (for more on regression, please see here ).

As a direct consequence, RBF netwoks can also do classification by doing regression on class indicators.

None of this is specific to RBF networks, and the same could be said about any function approximation technique. So why RBF networks ?

Why RBF networks ?

As the foregoing illustration shows, every region of the feature space is covered only by a small number of gaussians. This is because the gaussian function drops off very rapidely, and can therefore contribute significantly to the reconstructed function only in a very small zone area around its mean value. So RBF networks attempt a local approach to regression (contrary to most other regression techniques, from Simple Linear Regression to Multilayer Perceptrons). One might then expect that "learning" (calculating the coefficients of the model) will be greatly facilitated.

Learning of RBF netwoks

What are the parameters of the model ?

• Each basis gaussian function is defined by :
• The coefficients of the linear combination of the gaussians. There are as many coefficients as there are gaussians.

As we said, each gaussian covers only a small region of space. As we wish to reconstruct f(x) across its whole range, it is necessary to pave the feature space with a large number of gaussians. Each gaussian is, in turn, defined by a large number of parameters (see above). As a consequence, a fully operational RBF network is defined by a huge number of parameters (frequently several thousands).

Calculating the values of thousands of parameters with a traditional optimization technique becomes hopelessly long (practitioners already complain about the long training times of the Multilayer Perceptron, which contains only a handful of parameters).

The "effective number of parameters" of RBF networks is in fact much smaller because of their local architecture. This number is the one to take into account when pondering the bias-variance dilemma issue. We will not elaborate on the concept of effective number of parameters here.

Fortunately, because of their local behavior, RBF networks are quite tolerant as far as how the values of the parameters are calculated. There exist many fast heuristics that allow defining the positions and covariance matrices of the  gaussian basis functions without resorting to any optimization technique. Calculating the coefficients of the linear combination is then also simple and fast.

The coefficients values are then not optimal, but the network still works reasonably well.

Pros and cons of RBF networks

So, for a time, we felt we had invented a new technique with nice theoretical properties, but useless because of the prohibitive learning times. But it turns out that things are just the other way around : building RBF networks is fast and easy, and this is their main advantage.

On the performance side, RBF networks can't compete with more sophisticated techniques (in particular, the Multilayer Perceptron). More specifically, they do poorly :

• In high dimensional spaces (many input variables). This is true of all local techniques (think of the histogram).
• On noisy data. Again, because of the coefficients are adjusted so as to accommodate local data, RBF networks do not average out noise over the entire data space (compare with Linear Regression, whose sole purpose is exactly to average out noise over the data space).

________________

In conclusion, RBF networks are a credible alternative to the Multilayer Perceptron for moderately difficult regression or classification problems. Then, their learning speed and ease of use make them convenient tools.

_______________________________________