DocumentCode
1800425
Title
An ontology-based dimensionality reduction algorithm for biomedical literature classification
Author
Jing Wang ; Gongqing Wu ; Xuegang Hu
Author_Institution
School of Computer Science and Information Engineering, Hefei University of Technology, China, 230009
fYear
2013
fDate
1-8 Jan. 2013
Firstpage
1
Lastpage
5
Abstract
Dimension reduction is an important component in automatic text categorization, especially biomedical literature classification. Many studies have showed that statistic-based dimension reduction algorithms, like Information Gain (IG), are very effective in document categorization. However these algorithms still suffer from major drawbacks. One facet is that they tend to use all the words as features. Another facet is that they can´t capture the semantic information that underlies the lexical words. To overcome these drawbacks, in this paper, a novel algorithm is presented to reduce the dimensionality of biomedical literature. First, a good biomedical concept set can be obtained by the ontology-based entity extraction technique to be the feature space. The semantic relatedness information is incorporated by mapping some original features to “Least-Max-Cover” features, according to the structure of the domain ontology. We demonstrate our method on the problem of classifying MEDLINE-indexed journal abstracts using C4.5 as the basic classifier. The experimental results show that our method has achieved a significant improvement in F-value (3.5%) and recall (5.25%) on average, compared with other state-of-the-art dimensionality reduction algorithms such as IG, CHI, One-R and LARS.
Keywords
Classification algorithms; Educational institutions; Feature extraction; Ontologies; Prediction algorithms; Semantics; Text categorization; “Least-Max-Cover” strategy; automatic text categorization; dimension reduction; ontology;
fLanguage
English
Publisher
ieee
Conference_Titel
Conference Anthology, IEEE
Conference_Location
China
Type
conf
DOI
10.1109/ANTHOLOGY.2013.6784753
Filename
6784753
Link To Document