Stochastic Discrimination and its Implementation |
|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Stochastic discrimination (SD) is a general methodology
for carrying out supervised learning. It is based on a
mathematically rigorous theory, and of
particular note is the fact, provable from the theory
and verified in experimentation, that classifiers built
by the method of SD are unusually resistant to overtraining.
For example, unlike other methods such as Neural
Networks, by increasing training time, classifiers built by
SD can continue to improve on the
test set even after training set performance has peaked.
In theory, it is possible to build arbitrarily good classifiers using SD. But the method itself is robust, and so in practice, where theoretical requirements are usually not met exactly, the resulting classifiers built by SD are still remarkably good. This has been borne out experimentally on numerous occasions, and in section results, a systematic, strictly controlled, comparative study on standard benchmark problems shows a generic implementation of SD outperforming most other classification methods on a wide variety of data sets in almost all cases. This site aims to provide a broad overview of the theory underlying SD, to describe key factors underlying the creation of a working implementation of the theory, and, finally, to present experimental results obtained by a particular implementation of the method known as SDK. The scientifically inclined reader is urged to read recent publications on SD available for download on this site. [emk000] contains an implementation-oriented introduction to the subject which also includes both theoretical and experimental results, [emk001] contains a theoretical overview of the subject of supervised learning, and the relevance of SD in this context, and [emk96] contains a fairly complete mathematical treatment of SD, including proofs of the main underlying theorems. The original paper on Stochastic Discrimination is reference [emk90]. The site is arranged as follows. The introduction section gives an overview of SD. The section background section lays out the history behind SD. The next section, implementation discusses some of the implementation issues. Experimental results obtained from this implementation are discussed in the results section. And finally, the references section contains a list of the relevant references as well as links to some of those that are available for download. We are in the process of implementing a protocol for enabling serious researchers to experiment with our implementation SDK. If you are interested in having us try out SDK on your dataset, please contact Gene Kleinberg. |
|
Note: Some material displayed on this site maybe subject to the following: © 2000 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. |