DocumentCode :
3275174
Title :
Research on enhancing the effectiveness of the Chinese text automatic categorization based on ICTCLAS segmentation method
Author :
Xiangdong Li ; Cheng Zhang
Author_Institution :
Center for Studies of Inf. Resources, Wuhan Univ., Wuhan, China
fYear :
2013
fDate :
23-25 May 2013
Firstpage :
267
Lastpage :
270
Abstract :
The article proposed a method that suggest a way to replace some lower category identification capacity items from the ICTCLAS segmentation result by drawing the feature items that owns a better category identification capacity from the 2-gram segmentation result to improve the classification effect of ICTCALS segmentation method. By using KNN categorization algorithm and Naive Bayes text categorization method, it proved this way worked well on FuDan university corpus. And it also analyzed the reason why the method was relatively noneffective on the Sogou laboratory corpus through the test.
Keywords :
natural language processing; text analysis; Chinese text automatic categorization; ICTCLAS segmentation method; KNN categorization algorithm; Naive Bayes text categorization method; Sogou laboratory corpus; category identification capacity items; feature items; Educational institutions; Text categorization; Chinese segmentation; classification effect; high information; mix; text automatic categorization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Software Engineering and Service Science (ICSESS), 2013 4th IEEE International Conference on
Conference_Location :
Beijing
ISSN :
2327-0586
Print_ISBN :
978-1-4673-4997-0
Type :
conf
DOI :
10.1109/ICSESS.2013.6615302
Filename :
6615302
Link To Document :
بازگشت