DocumentCode :
3249036
Title :
Discriminative category matching: efficient text classification for huge document collections
Author :
Fung, Gabriel Pui Cheong ; Yu, Jeffrey Xu ; Lu, Hongjun
Author_Institution :
Dept. of Syst. Eng. & Eng. Manage., Chinese Univ. of Hong Kong, China
fYear :
2002
fDate :
2002
Firstpage :
187
Lastpage :
194
Abstract :
With the rapid growth of textual information available on the Internet, having a good model for classifying and managing documents automatically is undoubtedly important. When more documents are archived, new terms, new concepts and concept-drift will frequently appear Without a doubt, updating the classification model frequently, rather than using the old model for a very long period is absolutely essential. Here, the challenges are: a) obtain a high accuracy classification model; b) consume low computational time for both model training and operation; and c) occupy low storage space. However, none of the existing classification approaches could achieve all of these requirements. In this paper, we propose a novel text classification approach, called discriminative category matching, which could achieve all of the stated characteristics. Extensive experiments using two benchmarks and a large real-life collection are conducted. The encouraging results indicated that our approach is highly feasible.
Keywords :
Internet; computational complexity; data mining; pattern matching; text analysis; Internet; computational time; concept-drift; discriminative category matching; document classification; document management; efficient text classification; huge document collections; Computational efficiency; Computer science; Costs; Document handling; Government; Internet; Support vector machine classification; Support vector machines; Technology management; Text categorization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining, 2002. ICDM 2003. Proceedings. 2002 IEEE International Conference on
Print_ISBN :
0-7695-1754-4
Type :
conf
DOI :
10.1109/ICDM.2002.1183902
Filename :
1183902
Link To Document :
بازگشت