Title :
Determine the Critical dimension in data mining (experiments with bioinformatics datasets)
Author :
Suryakumar, Divya ; Sung, Andrew H. ; Liu, Qingzhong
Author_Institution :
Dept. of Comput. Sci. & Eng., New Mexico Tech, Socorro, TX, USA
Abstract :
The "curse of dimensionality" problem, which occurs in many applications involving data mining such as biomedical informatics, digital forensics, risk management, etc., makes it difficult to develop accurate learning machine classifiers when the dataset includes too many irrelevant or insignificant features. Therefore, finding the smallest set of features necessary to obtain the most accurate classifier is an issue of great theoretical and practical interest. In efforts toward developing formal methods for finding the "critical dimension", this paper presents an empirical study of the minimum number of features that are required for a learning machine to perform accurately. The dataset is first featured ranked; then, iteratively, the least important feature is removed and the performance is plotted against the number of features; the point at which the performance curve drops significantly and does not rise again gives the critical dimension, which is a unique number for each specific combination of learning machine and feature ranking method. It is shown in this paper that the critical dimension phenomenon indeed exists for several of the bioinformatics datasets studied.
Keywords :
bioinformatics; data mining; formal verification; learning (artificial intelligence); pattern classification; bioinformatics datasets; biomedical informatics; critical dimension; curse of dimensionality problem; data mining; digital forensics; feature ranking method; formal methods; learning machine classifiers; risk management; Decision support systems; Intelligent systems; Critical dimension; data mining; dimensionality reduction; feature or attribute reduction;
Conference_Titel :
Intelligent Systems Design and Applications (ISDA), 2011 11th International Conference on
Conference_Location :
Cordoba
Print_ISBN :
978-1-4577-1676-8
DOI :
10.1109/ISDA.2011.6121702