DocumentCode
1971310
Title
Document classification efficiency of phrase-based techniques
Author
Kapalavayi, Nagesh ; Murthy, S. N Jayaram ; Hu, Gongzhu
Author_Institution
Dept. of Comput. Sci., Central Michigan Univ., Mount Pleasant, MI
fYear
2009
fDate
10-13 May 2009
Firstpage
174
Lastpage
178
Abstract
Due to the exponential growth of available text documents in digital form, it is of great importance to develop techniques for automatic document classification based on the textual contents. Earlier document classification techniques have used keyword-based features and related statistics to achieve good results when applied to certain datasets. More recently, some of these techniques have been extended to include phrase-based and concept-based features to achieve better results. Since the characteristics of data sets used by each of these research groups are remarkably different, it is not possible to compare the efficiency of these methods. In this paper, we present a study that uses the same data set to compare efficiency of a phrase-based technique with key-word based techniques. Results prove conclusively that use of phrase-based features is very effective in document classification.
Keywords
classification; statistical analysis; text analysis; document classification; keyword based feature; phrase based technique; statistical dataset; text document; textual content; Computer science; Data engineering; Data mining; Databases; Information retrieval; Natural language processing; Programming profession; Statistics; Synthetic aperture sonar; Text mining; document classication; keyword-based and phrase-based features; text mining;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Systems and Applications, 2009. AICCSA 2009. IEEE/ACS International Conference on
Conference_Location
Rabat
Print_ISBN
978-1-4244-3807-5
Electronic_ISBN
978-1-4244-3806-8
Type
conf
DOI
10.1109/AICCSA.2009.5069321
Filename
5069321
Link To Document