Title :
An Efficient Heuristic for Discovering Multiple Ill-Defined Attributes in Datasets
Author_Institution :
Univ. du Quebec a Montreal, Que.
Abstract :
The accuracy of the rules produced by a concept learning system can be hindered by the presence of errors in the data, such as "ill-defined" attributes that are too general or too specific for the concept to learn. In this paper, we devise a method that uses the Boolean differences computed by a program called Newton to identify multiple ill-defined attributes in a dataset in a single pass. The method is based on a compound heuristic that assigns a real-valued rank to each possible hypothesis based on its key characteristics. We show by extensive empirical testing on randomly generated classifiers that the hypothesis with the highest rank is the correct one with an observed probability quickly converging to 100%. Moreover, the monotonicity of the function enables us to use it as a rough estimator of its own likelihood
Keywords :
Boolean functions; data mining; learning (artificial intelligence); pattern classification; probability; Boolean differences; Newton program; compound heuristic; concept learning system; multiple ill-defined attribute discovery; probability; Animals; Birds; Computer errors; Error correction; Learning systems; Machine learning; Noise measurement; System testing; Tail;
Conference_Titel :
Machine Learning and Applications, 2006. ICMLA '06. 5th International Conference on
Conference_Location :
Orlando, FL
Print_ISBN :
0-7695-2735-3
DOI :
10.1109/ICMLA.2006.14