Stochastic Discrimination: Experimental Results









We carried out a series of experiments using an algorithmic implementation of SD known as SDK.

To compare SD with other pattern recognition techniques, publicly accessible datasets from two major repositories of standardized problems in pattern recognition were obtained. These are the University of California at Irvine ML Repository and the Statlog dataset available from the University of Porto in Portugal.

From UC Irvine dataset, 17 different datasets were chosen, namely, Australian credit (henceforth denoted "crx"), Pima diabetes ("dia"), glass ("gls"), Cleveland heart ("hrt"), hepatitis ("hep"), ionosphere ("ion"), iris ("iri"), labor ("lab"), letter ("let"), satimage ("sat"), segment ("seg"), sonar ("son"), soybean-large ("soy"), splice ("spl"), vehicle ("veh"), vote ("vot"), and Wisconsin breast cancer ("wsc"). Our decision about which sets to use was based solely on popularity - of the many recent papers containing comparative studies of pattern recognition methods, these 17 sets tended to be studied more than any others from the Irvine collection. We compared our results to those reported by Freund and Schapire. We decided to use this paper since it focuses on boosting and bagging, two of the most popular, and powerful, methods currently being studied in pattern recognition. Three underlying "weak learning algorithms", FindAttrTest (henceforth denoted, "FIA"), FindDecRule (FID), and Quinlan's C4.5 (C45) (see []), along with their boosted and bagged versions (denoted ABO, DBO, 5BO, ABA, DBA, and 5BA, respectively), for a total of 9 methods, are reported on in [], and in our runs on these datasets, we used the same study paradigms (either 10-fold cross validation, or training/test set, depending on the dataset) as used by Freund and Schapire. Our only change (made because of time constraints) was that for two of the datasets which used training/test sets, namely, letter and satimage, we did not rerun the study 20 times with different seeds, but just reported the results for our single run (seed 1) of each problem. For the 10-fold cross-validation studies, however, we reran each cross-validation 10 times using different initial seeds for a total of 100 runs per study, and for the remaining training/test set problem, soybean-large, we reran the study 20 times with different seeds and averaged the results. The reader should refer to []for the details here.

The results for our runs, as well as for those reported in [] are presented in the figure below. The values listed under each method are error rates for the different problems. In Figure 1, we summarize these results graphically. Our focus is toward the relative ranks of the various methods over the 17 problems considered. Specifically, for each of the 10 methods (as listed on the x-axis), we provide a bar ranging from the best rank that method achieved on any of the problems, to the worst rank. That bar also has a left tic located at the mean of the ranks of that method across all 17 problems, and a right tic located at the mode of the ranks (assuming a mode exists for that method). The methods are listed in order of their mean rank, and our graph includes a line graph connecting the means of the ranks.

data size FIA ABO ABA FID DBO DBA C45 5BO 5BA SDK
crx 690 14.5 14.4 14.5 14.5 13.5 14.5 15.8 13.8 13.6 12.4
dia 768 26.1 24.4 26.1 27.8 25.3 26.4 28.4 25.7 24.4 25.5
gls 214 51.5 51.1 50.9 49.7 48.5 47.2 31.7 22.7 25.7 20.3
hrt 303 27.8 18.8 22.4 27.4 19.7 20.3 26.6 21.7 20.9 17.4
hep 155 19.7 18.6 16.8 21.6 18.0 20.1 21.2 16.3 17.5 16.2
ion 351 17.8 8.5 17.3 10.3 6.6 9.3 8.9 5.8 6.2 6.2
iri 150 35.2 4.7 28.4 38.3 4.3 18.8 5.9 5.0 5.0 4.2
lab 57 25.1 8.8 19.1 24.0 7.3 14.6 15.8 13.1 11.3 6.1
let 20000 92.9 92.9 91.9 92.3 91.8 91.8 13.8 3.3 6.8 3.3
sat 6435 58.3 58.3 58.3 57.6 56.5 56.7 14.8 8.9 10.6 8.7
seg 2310 75.8 75.8 54.5 73.7 53.3 54.3 3.6 1.4 2.7 1.9
son 208 25.9 16.5 25.9 31.4 15.2 26.1 28.9 19.0 24.3 10.6
soy 683 64.8 64.5 59.0 73.6 73.6 73.6 13.3 6.8 12.2 5.9
spl 3190 37.0 9.2 35.6 29.5 8.0 29.5 5.8 4.9 5.2 4.9
veh 846 64.3 64.4 57.6 61.3 61.2 61.0 29.9 22.6 26.1 22.1
vot 435 4.4 3.7 4.4 4.0 4.4 4.4 3.5 5.1 3.6 3.5
wsc 699 8.4 4.4 6.7 8.1 4.1 5.3 5.0 3.3 3.2 2.6
Table 1: Experimental Results - Freund-Schapire

Figure 1: Irvine Comparisons

In the Statlog database, there are 10 sets publicly available. However, two of these sets (heart data and German credit data) involved nontrivial cost matrices, and since the implementation SDK does not take issues of cost into account, we didn't use these sets. One additional set, namely the shuttle dataset, was extremely underrepresented in some classes (2 test points for class 7 out of a total of 58,000 training and test examples) so we also decided to eliminate that set from consideration. Of the remaining 7 sets, we carried out studies using the same conditions as []. In particular, the Australian Credit Approval problem (henceforth denoted "crx") used 10-fold cross-validation, the Pima Indians Diabetes problem (dia) used 12-fold cross-validation, the DNA problem (dna) used training (2000 points) and test (1186 points) sets, the Letter Image Recognition problem (let) used training (15000 points) and test (5000 points) sets, the Satellite Image problem (sat) used training (4435 points) and test (2000 points) sets, the Image Segmentation problem (seg) used 10-fold cross-validation, and the Vehicle Silhouette problem (veh) used 9-fold cross-validation.

The results for our runs, as well as for those reported in [] are presented in table below;. And in Figure 2, we graphically represent relative ranks as we did before in Figure 1.

method crx dia dna let sat seg veh
Ac2 0.181 0.276 0.245 0.245 0.157 0.031 0.296
Alloc80 0.201 0.301 0.064 0.064 0.132 0.030 0.173
BackProp 0.154 0.248 0.327 0.327 0.139 0.054 0.207
BayTree 0.171 0.271 0.124 0.124 0.147 0.033 0.271
Bayes 0.151 0.262 0.529 0.529 0.287 0.265 0.558
C4.5 0.155 0.270 0.132 0.132 0.150 0.040 0.266
Cal5 0.131 0.250 0.253 0.253 0.151 0.062 0.279
Cart 0.145 0.255 NA NA 0.138 0.040 0.235
Castle 0.148 0.258 0.245 0.245 0.194 0.112 0.505
Cn2 0.204 0.289 0.115 0.115 0.150 0.043 0.314
Default 0.440 0.350 0.960 0.960 0.769 0.760 0.750
Dipol92 0.141 0.224 0.176 0.176 0.111 0.039 0.151
Discrim 0.141 0.225 0.302 0.302 0.171 0.116 0.216
IndCart 0.152 0.271 0.130 0.130 0.138 0.045 0.298
Itrule 0.137 0.245 0.594 0.594 NA 0.455 0.324
KNN 0.181 0.324 0.068 0.068 0.094 0.077 0.275
Kohonen NA 0.273 0.252 0.252 0.179 0.067 0.340
LVQ 0.197 0.272 0.079 0.079 0.105 0.046 0.287
LogDisc 0.141 0.223 0.234 0.234 0.163 0.109 0.192
NewId 0.181 0.289 0.128 0.128 0.150 0.034 0.298
QuaDisc 0.207 0.262 0.113 0.113 0.155 0.157 0.150
Radial 0.145 0.243 0.233 0.233 0.121 0.069 0.307
SDK 0.126 0.233 0.033 0.038 0.0865 0.021 0.201
Smart 0.158 0.232 0.295 0.295 0.159 0.052 0.217
Table 2: Experimental Results - Statlog

Figure 2: Statlog Comparisons


kleinbrg@math.buffalo.edu
Last modified on Saturday, 25-Feb-2006 18:37:48 EST