DocumentCode :
2418372
Title :
Enhancing text classification using synopses extraction
Author :
Ma, Liping ; Shepherd, John ; Zhang, Yanchun
Author_Institution :
Sch. of Comput. Sci. & Eng., New South Wales Univ., Sydney, NSW, Australia
fYear :
2003
fDate :
10-12 Dec. 2003
Firstpage :
115
Lastpage :
124
Abstract :
This paper describes a novel approach to document classification that uses decision-tree machine learning based on a succinct vector of important terms in each document. The succinct vector itself is generated by a machine-learning approach which builds parsers that can identify significant features in a document by partitioning it into regions based on low-level document characteristics. The fact that the feature vector is succinct overcomes the problem of very large term vectors, which have hindered the application of conventional machine learning to document classification. The fact that the parser can be trained to extract only important terms from documents means that small training sets can be used to achieve the same classification accuracy as with conventional approaches.
Keywords :
classification; decision trees; grammars; learning (artificial intelligence); text analysis; World Wide Web; classification accuracy; decision-tree machine learning; document classification; electronic texts; information management; parsers; succinct vector; synopses extraction; text classification; Australia; Character generation; Computer science; Data mining; Frequency; Machine learning; Management training; Mathematics; Sparse matrices; Text categorization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Information Systems Engineering, 2003. WISE 2003. Proceedings of the Fourth International Conference on
Print_ISBN :
0-7695-1999-7
Type :
conf
DOI :
10.1109/WISE.2003.1254475
Filename :
1254475
Link To Document :
بازگشت