• DocumentCode
    945818
  • Title

    Combining Subclassifiers in Text Categorization: A DST-Based Solution and a Case Study

  • Author

    Sarinnapakorn, Kanoksri ; Kubat, Miroslav

  • Author_Institution
    Miami Univ., Miami
  • Volume
    19
  • Issue
    12
  • fYear
    2007
  • Firstpage
    1638
  • Lastpage
    1651
  • Abstract
    Text categorization systems often use machine learning techniques to induce document classifiers from preclassified examples. The fact that each example document belongs to many classes often leads to very high computational costs that sometimes grow exponentially in the number of features. Seeking to reduce these costs, we explored the possibility of running a "baseline induction algorithm" separately for subsets of features, obtaining a set of classifiers to be combined. For the specific case of classifiers that return not only class labels but also confidences in these labels, we investigate here a few alternative fusion techniques, including our own mechanism that was inspired by the Dempster-Shafer Theory. The paper describes the algorithm and, in our specific case study, compares its performance to that of more traditional mechanisms.
  • Keywords
    classification; inference mechanisms; learning (artificial intelligence); text analysis; uncertainty handling; DST-based solution; Dempster-Shafer theory; baseline induction algorithm; document classifier; machine learning technique; text categorization system; Dempster-Shafer Theory.; Machine Learning; data fusion; multi-label examples; text categorization;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2007.190663
  • Filename
    4358949