Stochastic Discrimination: Experimental Results |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
We carried out a series of experiments using an
algorithmic implementation of SD known as SDK.
To compare SD with other pattern recognition techniques, publicly accessible datasets from two major repositories of standardized problems in pattern recognition were obtained. These are the University of California at Irvine ML Repository and the Statlog dataset available from the University of Porto in Portugal. From UC Irvine dataset, 17 different datasets were chosen, namely, Australian credit (henceforth denoted "crx"), Pima diabetes ("dia"), glass ("gls"), Cleveland heart ("hrt"), hepatitis ("hep"), ionosphere ("ion"), iris ("iri"), labor ("lab"), letter ("let"), satimage ("sat"), segment ("seg"), sonar ("son"), soybean-large ("soy"), splice ("spl"), vehicle ("veh"), vote ("vot"), and Wisconsin breast cancer ("wsc"). Our decision about which sets to use was based solely on popularity - of the many recent papers containing comparative studies of pattern recognition methods, these 17 sets tended to be studied more than any others from the Irvine collection. We compared our results to those reported by Freund and Schapire. We decided to use this paper since it focuses on boosting and bagging, two of the most popular, and powerful, methods currently being studied in pattern recognition. Three underlying "weak learning algorithms", FindAttrTest (henceforth denoted, "FIA"), FindDecRule (FID), and Quinlan's C4.5 (C45) (see []), along with their boosted and bagged versions (denoted ABO, DBO, 5BO, ABA, DBA, and 5BA, respectively), for a total of 9 methods, are reported on in [], and in our runs on these datasets, we used the same study paradigms (either 10-fold cross validation, or training/test set, depending on the dataset) as used by Freund and Schapire. Our only change (made because of time constraints) was that for two of the datasets which used training/test sets, namely, letter and satimage, we did not rerun the study 20 times with different seeds, but just reported the results for our single run (seed 1) of each problem. For the 10-fold cross-validation studies, however, we reran each cross-validation 10 times using different initial seeds for a total of 100 runs per study, and for the remaining training/test set problem, soybean-large, we reran the study 20 times with different seeds and averaged the results. The reader should refer to []for the details here. The results for our runs, as well as for those reported in [] are presented in the figure below. The values listed under each method are error rates for the different problems. In Figure 1, we summarize these results graphically. Our focus is toward the relative ranks of the various methods over the 17 problems considered. Specifically, for each of the 10 methods (as listed on the x-axis), we provide a bar ranging from the best rank that method achieved on any of the problems, to the worst rank. That bar also has a left tic located at the mean of the ranks of that method across all 17 problems, and a right tic located at the mode of the ranks (assuming a mode exists for that method). The methods are listed in order of their mean rank, and our graph includes a line graph connecting the means of the ranks.
In the Statlog database, there are 10 sets publicly available. However, two of these sets (heart data and German credit data) involved nontrivial cost matrices, and since the implementation SDK does not take issues of cost into account, we didn't use these sets. One additional set, namely the shuttle dataset, was extremely underrepresented in some classes (2 test points for class 7 out of a total of 58,000 training and test examples) so we also decided to eliminate that set from consideration. Of the remaining 7 sets, we carried out studies using the same conditions as []. In particular, the Australian Credit Approval problem (henceforth denoted "crx") used 10-fold cross-validation, the Pima Indians Diabetes problem (dia) used 12-fold cross-validation, the DNA problem (dna) used training (2000 points) and test (1186 points) sets, the Letter Image Recognition problem (let) used training (15000 points) and test (5000 points) sets, the Satellite Image problem (sat) used training (4435 points) and test (2000 points) sets, the Image Segmentation problem (seg) used 10-fold cross-validation, and the Vehicle Silhouette problem (veh) used 9-fold cross-validation. The results for our runs, as well as for those reported in [] are presented in table below;. And in Figure 2, we graphically represent relative ranks as we did before in Figure 1.
|