Title :
Optimization for Vietnamese text classification problem by reducing features set
Author :
Ha Nguyen Thi Thu ; Quynh Nguyen Huu ; Khanh Nguyen Thi Hong ; Hung Le Manh
Author_Institution :
Dept. of Comput. Sci., Vietnam Electr. Power Univ., Hanoi, Vietnam
Abstract :
Vietnamese is the single syllable language, so that process of word segmentation is relatively complex, if split word based on whitespaces, it is not accuracy, on the other hand Vietnamese segmentation tools are not high effective. In this paper, we propose a new method that used only topic word for calculating to increase accuracy of the Vietnameses text classification system and optimize the process of calculating. The experimental results show that our method more effective than the proposed approach, higher accuracy and reduce the computational complexity.
Keywords :
classification; computational complexity; natural language processing; optimisation; text analysis; word processing; Vietnamese segmentation tools; Vietnamese text classification problem; Vietnamese text classification system accuracy calculation; calculation process optimization; computational complexity reduction; feature set reduction; split word; topic word; whitespaces; word segmentation; Vietnamese text classification; feature set reduction; syllable language; topic word;
Conference_Titel :
Information Science and Service Science and Data Mining (ISSDM), 2012 6th International Conference on New Trends in
Conference_Location :
Taipei
Print_ISBN :
978-1-4673-0876-2