• DocumentCode
    1800425
  • Title

    An ontology-based dimensionality reduction algorithm for biomedical literature classification

  • Author

    Jing Wang ; Gongqing Wu ; Xuegang Hu

  • Author_Institution
    School of Computer Science and Information Engineering, Hefei University of Technology, China, 230009
  • fYear
    2013
  • fDate
    1-8 Jan. 2013
  • Firstpage
    1
  • Lastpage
    5
  • Abstract
    Dimension reduction is an important component in automatic text categorization, especially biomedical literature classification. Many studies have showed that statistic-based dimension reduction algorithms, like Information Gain (IG), are very effective in document categorization. However these algorithms still suffer from major drawbacks. One facet is that they tend to use all the words as features. Another facet is that they can´t capture the semantic information that underlies the lexical words. To overcome these drawbacks, in this paper, a novel algorithm is presented to reduce the dimensionality of biomedical literature. First, a good biomedical concept set can be obtained by the ontology-based entity extraction technique to be the feature space. The semantic relatedness information is incorporated by mapping some original features to “Least-Max-Cover” features, according to the structure of the domain ontology. We demonstrate our method on the problem of classifying MEDLINE-indexed journal abstracts using C4.5 as the basic classifier. The experimental results show that our method has achieved a significant improvement in F-value (3.5%) and recall (5.25%) on average, compared with other state-of-the-art dimensionality reduction algorithms such as IG, CHI, One-R and LARS.
  • Keywords
    Classification algorithms; Educational institutions; Feature extraction; Ontologies; Prediction algorithms; Semantics; Text categorization; “Least-Max-Cover” strategy; automatic text categorization; dimension reduction; ontology;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Conference Anthology, IEEE
  • Conference_Location
    China
  • Type

    conf

  • DOI
    10.1109/ANTHOLOGY.2013.6784753
  • Filename
    6784753