DocumentCode
945818
Title
Combining Subclassifiers in Text Categorization: A DST-Based Solution and a Case Study
Author
Sarinnapakorn, Kanoksri ; Kubat, Miroslav
Author_Institution
Miami Univ., Miami
Volume
19
Issue
12
fYear
2007
Firstpage
1638
Lastpage
1651
Abstract
Text categorization systems often use machine learning techniques to induce document classifiers from preclassified examples. The fact that each example document belongs to many classes often leads to very high computational costs that sometimes grow exponentially in the number of features. Seeking to reduce these costs, we explored the possibility of running a "baseline induction algorithm" separately for subsets of features, obtaining a set of classifiers to be combined. For the specific case of classifiers that return not only class labels but also confidences in these labels, we investigate here a few alternative fusion techniques, including our own mechanism that was inspired by the Dempster-Shafer Theory. The paper describes the algorithm and, in our specific case study, compares its performance to that of more traditional mechanisms.
Keywords
classification; inference mechanisms; learning (artificial intelligence); text analysis; uncertainty handling; DST-based solution; Dempster-Shafer theory; baseline induction algorithm; document classifier; machine learning technique; text categorization system; Dempster-Shafer Theory.; Machine Learning; data fusion; multi-label examples; text categorization;
fLanguage
English
Journal_Title
Knowledge and Data Engineering, IEEE Transactions on
Publisher
ieee
ISSN
1041-4347
Type
jour
DOI
10.1109/TKDE.2007.190663
Filename
4358949
Link To Document