







|
Stochastic discrimination (SD) is a general methodology for
constructing classifiers appropriate for pattern recognition. It is
based on combining arbitrary numbers of very weak components, which are
usually generated by some pseudo-random process, and it has the property
that the very complex and accurate classifiers produced in this way retain
the ability, characteristic of their weak component pieces, to generalize
to new data. This phenomenon is well understood from a theoretical point
of view, and has been consistently observed in applications, where
theoretical norms are not usually met. In fact, it is often observed that
classifier performance on test sets continues to rise as more weak
components are added, even after
performance on training sets seems to have reached a maximum.
This phenomenon is also
understood mathematically; it turns out that even though the formal error
rate on the training set may have reached a minimum, more sophisticated
measures intrinsic to this method indicate that classifier performance
on both training and test sets continues to improve as complexity increases.
Stochastic discrimination might appear to fall in
the general category of pattern
recognition methods based on the notion of combining classifiers.
But there is a fundamental difference which epitomizes the essence
of stochastic discrimination. Indeed, most combination methods
deal with component classifiers which, even if they may be weak for
the problem at hand, are still somewhat ``expert'' in that they
will tend to agree with one another about the classification of at
least some of the ``easy'' points in the feature space.
However, in the case of stochastic discrimination, not only does this
tend not to happen - it is essential to the success of the method
that it be avoided to the greatest extent possible. We are not
talking here about
simply trying to maintain some degree of orthogonality among
component classifiers. In the case of SD, all points
of a given class in the feature
space must be ``viewed equivalently'' by the full set
of weak component classifiers.
This notion, which in the theory
of stochastic discrimination is known as ``uniformity,'' captures
the essence of stochastic discrimination, making it, in effect, a
method for combining ``classifiers'' which, modulo specific points in
the feature space, have no commonality what-so-ever.
In a sense, the weak component classifiers are not really
classifiers at all - they are simply subsets of the feature
space selected from a randomly generated stream of such subsets
solely by virtue of their having an error rate for the problem at
hand slightly better than the average for the stream.
|