Title :
Hybrid feature selection methods for online biomedical publication classification
Author :
Long Ma;Yanqing Zhang;Raj Sunderraman;Peter T. Fox;Angela R. Laird;Jessica A. Turner;Matthew D. Turner
Author_Institution :
Department of Computer Science, Georgia State University, Atlanta, Georgia, USA
Abstract :
We review several feature selection methods: Recursive Feature Elimination, Select K Best, and Random Forests, as elements of a processing chain for feature selection in a text mining task. The text mining task is a multi-label classification problem of label assignment; metadata that is usually applied to published scientific papers by expert curators. In the formulation of this classification task, a feature space that is dramatically larger than the available training data occurs naturally and inevitably. We explore ways to reduce the dimension of the feature space, and show that sequential feature selection does substantially improve performance for this complex type of data.
Keywords :
"Radio frequency","Metadata","Training","Support vector machines","Training data","Vocabulary","Text mining"
Conference_Titel :
Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), 2015 IEEE Conference on
DOI :
10.1109/CIBCB.2015.7300320