Stochastic Discrimination: Implementation |
|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Anyone with a sufficient mathematical background should be able to
implement a working version of SD using the published papers, particularly
[emk000] and [emk96],
as references. To this point in time, we are aware
of at least 7 independent groups of people who have done exactly this, that
is, created working implementations in this way.
The degree of success of each implementation varies from problem to
problem, but all seem to perform along the lines predicted by the theory.
Based on results reported to us, no one of these current implementations,
including our own,
outperforms all the others on the full set of benchmark problems, and so
we feel an optimal implementation is yet to be written.
Our particular implementation of SD is a program known as SDK.
It is structured along the lines of the pseudo-code described in
the IEEE PAMI paper. The original
version of SDK was written in C and implemented on a Sun SparcStation
running SunOS 4.1.1.
The goal has always been to develop the theory first, and then implement
it to see if the experimental results accurately reflect the theoretical
predictions. As such, the focus of this site is more to draw attention to
the soundness of the theory underlying SD, rather than any particular
implementation.
Implementation hints from SDKGiven a general, n-dimensional feature space F, SDK first embeds F into Euclidean n-space (by mapping any non-numeric field values to nonnegative integers). For each coordinate k between 1 and n inclusive, it then determines maximum and minimum values, maxk and mink, respectively, among training points for that coordinate. Weak models are formed by taking finite unions of rectangular parallelopipeds contained in F. Each such rectangular parallelopiped, R, is generated as follows: (a) randomly choose a point q = (q1, q2, ..., qn) in the training set; (b) for each k between 1 and n inclusive, randomly generate a subinterval [uprk,lwrk] of [mink,maxk] such that q is a member of [uprk,lwrk]; (c) let R denote the n-fold cartesian product of the intervals [uprk,lwrk].By default, it is assumed that there exists a spatial relationship between training and test points of like class, and so SDK's weak models generalize from training to test data by virtue of being "thick" subsets of the feature space. This "thickness" can be further increased by pushing a number of lwrk and uprk out to their limits mink and maxk, respectively. By default, SDK previews the feature space prior to the start of the process of collecting weak models, by simultaneously adjusting both thickness of rectangles, and number of rectangles, so as to have a typical weak model cover a certain percentage range, [x,y] (default is currently [10%,20%] of the feature space). (Note that weak models of this sort can alternately be viewed simply as finite unions of finite intersections of regions determined by splitting F with axis-orthogonal hyperplanes.) Once the collection process begins, SDK simply iterates the following steps:
Many people have expressed interest in receiving more detail concerning step (c) of the above process. The only subtlety in generalizing the uniformity promotion algorithm mentioned in [emk000] to "real-world" circumstances, involves the realization that the theoretical requirement in the formal definition of uniformity asks that for each pair (x,y) such that there exist weak models in the collection P[i] with associated (Frac.red,Frac.green) = (x,y), coverages of points of like class by such weak models must be the same. There are many ways to algorithmically achieve this, either by the compartmentalization of the algorithm for each such pair, or by defining a measure of uniformity promotion for a given weak model by normalizing based its associated values Frac.red and Frac.green. We are aware of implementations which use each of these general approaches. The implementation SDK uses normalization. The use of uniformity in the proof of theorem 1 in [emk96], provides the method for this. |