Title :
Research on enhancing the effectiveness of the Chinese text automatic categorization based on ICTCLAS segmentation method
Author :
Xiangdong Li ; Cheng Zhang
Author_Institution :
Center for Studies of Inf. Resources, Wuhan Univ., Wuhan, China
Abstract :
The article proposed a method that suggest a way to replace some lower category identification capacity items from the ICTCLAS segmentation result by drawing the feature items that owns a better category identification capacity from the 2-gram segmentation result to improve the classification effect of ICTCALS segmentation method. By using KNN categorization algorithm and Naive Bayes text categorization method, it proved this way worked well on FuDan university corpus. And it also analyzed the reason why the method was relatively noneffective on the Sogou laboratory corpus through the test.
Keywords :
natural language processing; text analysis; Chinese text automatic categorization; ICTCLAS segmentation method; KNN categorization algorithm; Naive Bayes text categorization method; Sogou laboratory corpus; category identification capacity items; feature items; Educational institutions; Text categorization; Chinese segmentation; classification effect; high information; mix; text automatic categorization;
Conference_Titel :
Software Engineering and Service Science (ICSESS), 2013 4th IEEE International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4673-4997-0
DOI :
10.1109/ICSESS.2013.6615302