Stochastic Discrimination: Introduction









Stochastic discrimination (SD) is a general methodology for constructing classifiers appropriate for pattern recognition. It is based on combining arbitrary numbers of very weak components, which are usually generated by some pseudo-random process, and it has the property that the very complex and accurate classifiers produced in this way retain the ability, characteristic of their weak component pieces, to generalize to new data. This phenomenon is well understood from a theoretical point of view, and has been consistently observed in applications, where theoretical norms are not usually met. In fact, it is often observed that classifier performance on test sets continues to rise as more weak components are added, even after performance on training sets seems to have reached a maximum. This phenomenon is also understood mathematically; it turns out that even though the formal error rate on the training set may have reached a minimum, more sophisticated measures intrinsic to this method indicate that classifier performance on both training and test sets continues to improve as complexity increases.

Stochastic discrimination might appear to fall in the general category of pattern recognition methods based on the notion of combining classifiers. But there is a fundamental difference which epitomizes the essence of stochastic discrimination. Indeed, most combination methods deal with component classifiers which, even if they may be weak for the problem at hand, are still somewhat ``expert'' in that they will tend to agree with one another about the classification of at least some of the ``easy'' points in the feature space. However, in the case of stochastic discrimination, not only does this tend not to happen - it is essential to the success of the method that it be avoided to the greatest extent possible. We are not talking here about simply trying to maintain some degree of orthogonality among component classifiers. In the case of SD, all points of a given class in the feature space must be ``viewed equivalently'' by the full set of weak component classifiers. This notion, which in the theory of stochastic discrimination is known as ``uniformity,'' captures the essence of stochastic discrimination, making it, in effect, a method for combining ``classifiers'' which, modulo specific points in the feature space, have no commonality what-so-ever. In a sense, the weak component classifiers are not really classifiers at all - they are simply subsets of the feature space selected from a randomly generated stream of such subsets solely by virtue of their having an error rate for the problem at hand slightly better than the average for the stream.


kleinbrg@math.buffalo.edu
Last modified on Saturday, 25-Feb-2006 18:37:48 EST