Title :
Generating multiple noise elimination filters with the ensemble-partitioning filter
Author :
Khoshgoftaar, Taghi M. ; Rebours, Pierre
Author_Institution :
Florida Atlantic Univ., Boca Raton, FL, USA
Abstract :
We present the ensemble-partitioning filter which is a generalization of some common filtering techniques developed in the literature. Filtering the training dataset, i.e., removing noisy data, can be used to improve the accuracy of the induced data mining learners. Tuning the few parameters of the ensemble-partitioning filter allows filtering a given data mining problem appropriately. For example, it is possible to specialize the ensemble-partitioning filter into the classification, ensemble, multiple-partitioning, or iterative-partitioning filter. The predictions of the filtering experts are then utilized such that if an instance is misclassified by a certain number of experts or learners, it is identified as noisy. The conservativeness of the ensemble-partitioning filter depends on the filtering level and the number of filtering iterations. A case study of software metrics data from a high assurance software project analyzes the similarities between the filters obtained from the specialization of the ensemble-partitioning filter. We show that over 25% of the time, the filters at different levels of conservativeness agree on labeling instances as noisy. In addition, the classification filter has the lowest agreement with the other filters.
Keywords :
data mining; information filtering; project management; software fault tolerance; software metrics; software quality; classification filter; data mining; ensemble-partitioning filter; iterative-partitioning filter; multiple noise elimination filters generation; software faults; software measurement; software metrics data; software project; Computer errors; Data mining; Filtering; Filters; Labeling; Noise generators; Noise level; Software engineering; Software measurement; Software quality;
Conference_Titel :
Information Reuse and Integration, 2004. IRI 2004. Proceedings of the 2004 IEEE International Conference on
Print_ISBN :
0-7803-8819-4
DOI :
10.1109/IRI.2004.1431489