• DocumentCode
    1066019
  • Title

    Blocking reduction strategies in hierarchical text classification

  • Author

    Sun, Aixin ; Lim, Ee-Peng ; Ng, Wee-Keong ; Srivastava, Jaideep

  • Author_Institution
    Center for Adv. Inf. Syst., Nanyang Technol. Univ., Singapore
  • Volume
    16
  • Issue
    10
  • fYear
    2004
  • Firstpage
    1305
  • Lastpage
    1308
  • Abstract
    One common approach in hierarchical text classification involves associating classifiers with nodes in the category tree and classifying text documents in a top-down manner. Classification methods using this top-down approach can scale well and cope with changes to the category trees. However, all these methods suffer from blocking which refers to documents wrongly rejected by the classifiers at higher-levels and cannot be passed to the classifiers at lower-levels. We propose a classifier-centric performance measure known as blocking factor to determine the extent of the blocking. Three methods are proposed to address the blocking problem, namely, threshold reduction, restricted voting, and extended multiplicative. Our experiments using support vector machine (SVM) classifiers on the Reuters collection have shown that they all could reduce blocking and improve the classification accuracy. Our experiments have also shown that the Restricted Voting method delivered the best performance.
  • Keywords
    data mining; pattern classification; support vector machines; text analysis; trees (mathematics); Reuters collection; blocking factor; category tree; classifier-centric performance measure; data mining; extended multiplicative; hierarchical text classification; reduction strategies; restricted voting; support vector machine classifiers; text document classification; text mining; threshold reduction; Classification tree analysis; Data mining; Learning systems; Sun; Support vector machine classification; Support vector machines; Text categorization; Text mining; Voting; 65; Index Terms- Data mining; classification.; text mining;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2004.50
  • Filename
    1324637