DocumentCode :
2666018
Title :
Flexible length phrases in document classification
Author :
Radosevic, D. ; Dobsa, J.
Author_Institution :
Fac. of organization & informatics, Zagreb Univ.
fYear :
0
fDate :
0-0 0
Firstpage :
457
Lastpage :
462
Abstract :
In this paper we investigate possibility of using phrases of flexible length in classification of textual documents as an extension to classic bag of words document representation where documents are represented using single words as index terms. The investigation is conducted on collection of articles from Vecernji list. It is shown that usage of flexible length phrases improves precision of automatic document classification and there are indications that such approach could be used for genre classification
Keywords :
classification; text analysis; vocabulary; automatic document classification; flexible length phrases; genre classification; index terms; textual document classification; word document representation; Design for experiments; Frequency; Indexing; Informatics; Large-scale systems; Stress; Support vector machine classification; Support vector machines; Text categorization; Writing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Technology Interfaces, 2006. 28th International Conference on
Conference_Location :
Cavtat/Dubrovnik
ISSN :
1330-1012
Print_ISBN :
953-7138-05-4
Type :
conf
DOI :
10.1109/ITI.2006.1708524
Filename :
1708524
Link To Document :
بازگشت