• DocumentCode
    163219
  • Title

    An improvement of flat approach on hierarchical text classification using top-level pruning classifiers

  • Author

    Phachongkitphiphat, Natchanon ; Vateekul, Peerapon

  • Author_Institution
    Dept. of Comput. Eng., Chulalongkorn Univ., Bangkok, Thailand
  • fYear
    2014
  • fDate
    14-16 May 2014
  • Firstpage
    86
  • Lastpage
    90
  • Abstract
    Hierarchical classification has been becoming a popular research topic nowadays, particularly on the web as text categorization. For a large web corpus, there can be a hierarchy with hundreds of thousands of topics, so it is common to handle this task using a flat classification approach, inducing a binary classifier only for the leaf-node classes. However, it always suffers from such low prediction accuracy due to an imbalanced issue in the training data. In this paper, we propose two novel strategies: (i) “Top-Level Pruning” to narrow down the candidate classes, and (ii) “Exclusive Top-Level Training Policy” to build more effective classifiers by utilizing the top-level data. The experiments on the Wikipedia dataset show that our system outperforms the traditional flat approach unanimously on all hierarchical classification metrics.
  • Keywords
    Internet; pattern classification; text analysis; Wikipedia dataset; binary classifier; exclusive top-level training policy; flat classification approach; hierarchical text classification; leaf-node classes; text categorization; top-level pruning classifiers; flat approach; hierarchical classification; hierarchy pruning; text classification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science and Software Engineering (JCSSE), 2014 11th International Joint Conference on
  • Conference_Location
    Chon Buri
  • Print_ISBN
    978-1-4799-5821-4
  • Type

    conf

  • DOI
    10.1109/JCSSE.2014.6841847
  • Filename
    6841847