Stochastic Discrimination: Implementation









Anyone with a sufficient mathematical background should be able to implement a working version of SD using the published papers, particularly [emk000] and [emk96], as references. To this point in time, we are aware of at least 7 independent groups of people who have done exactly this, that is, created working implementations in this way. The degree of success of each implementation varies from problem to problem, but all seem to perform along the lines predicted by the theory. Based on results reported to us, no one of these current implementations, including our own, outperforms all the others on the full set of benchmark problems, and so we feel an optimal implementation is yet to be written.

Our particular implementation of SD is a program known as SDK. It is structured along the lines of the pseudo-code described in the IEEE PAMI paper. The original version of SDK was written in C and implemented on a Sun SparcStation running SunOS 4.1.1.
Today, SDK has been implemented in C under Sun/Solaris as well as Linux. As the theory behind SD continues to evolve, so does the implementation SDK.

The goal has always been to develop the theory first, and then implement it to see if the experimental results accurately reflect the theoretical predictions. As such, the focus of this site is more to draw attention to the soundness of the theory underlying SD, rather than any particular implementation.
We do understand, however, that readers may be interested in implementing their own version of SD. For these readers, we provide below some hints on our implementation SDK. By not explicitly laying out the pseudo-code for SDK, we hope to encourage the development new implementational approaches. As mentioned previously, it is our belief that the "best" implementation is yet to come.

Implementation hints from SDK

Given a general, n-dimensional feature space F, SDK first embeds F into Euclidean n-space (by mapping any non-numeric field values to nonnegative integers). For each coordinate k between 1 and n inclusive, it then determines maximum and minimum values, maxk and mink, respectively, among training points for that coordinate. Weak models are formed by taking finite unions of rectangular parallelopipeds contained in F. Each such rectangular parallelopiped, R, is generated as follows: (a) randomly choose a point q = (q1, q2, ..., qn) in the training set; (b) for each k between 1 and n inclusive, randomly generate a subinterval [uprk,lwrk] of [mink,maxk] such that q is a member of [uprk,lwrk]; (c) let R denote the n-fold cartesian product of the intervals [uprk,lwrk].

By default, it is assumed that there exists a spatial relationship between training and test points of like class, and so SDK's weak models generalize from training to test data by virtue of being "thick" subsets of the feature space. This "thickness" can be further increased by pushing a number of lwrk and uprk out to their limits mink and maxk, respectively. By default, SDK previews the feature space prior to the start of the process of collecting weak models, by simultaneously adjusting both thickness of rectangles, and number of rectangles, so as to have a typical weak model cover a certain percentage range, [x,y] (default is currently [10%,20%] of the feature space). (Note that weak models of this sort can alternately be viewed simply as finite unions of finite intersections of regions determined by splitting F with axis-orthogonal hyperplanes.)

Once the collection process begins, SDK simply iterates the following steps:

  • (a) generate a new weak model, M;
  • (b) test M for enrichment;
  • (c) if M passes the enrichment test (b) above, then test M for uniformity with respect to the current pool of weak models, P[i], thus far collected;
  • (d) if M passes the uniformity test (c) above, add it the the current pool of weak models, that is, let P[i+1] be the union of P[i] with {M};
  • (e) goto (a).

Many people have expressed interest in receiving more detail concerning step (c) of the above process. The only subtlety in generalizing the uniformity promotion algorithm mentioned in [emk000] to "real-world" circumstances, involves the realization that the theoretical requirement in the formal definition of uniformity asks that for each pair (x,y) such that there exist weak models in the collection P[i] with associated (Frac.red,Frac.green) = (x,y), coverages of points of like class by such weak models must be the same. There are many ways to algorithmically achieve this, either by the compartmentalization of the algorithm for each such pair, or by defining a measure of uniformity promotion for a given weak model by normalizing based its associated values Frac.red and Frac.green.

We are aware of implementations which use each of these general approaches. The implementation SDK uses normalization. The use of uniformity in the proof of theorem 1 in [emk96], provides the method for this.


kleinbrg@math.buffalo.edu
Last modified on Saturday, 25-Feb-2006 18:37:48 EST