DocumentCode :
2396505
Title :
Text categorization based on improved Rocchio algorithm
Author :
Gao, Guanyu ; Guan, Shengxiao
Author_Institution :
Dept. of Autom., Univ. of Sci. & Technol. of China, Hefei, China
fYear :
2012
fDate :
19-20 May 2012
Firstpage :
2247
Lastpage :
2250
Abstract :
Text categorization is used to assign each text document to predefined categories. This paper presents a new text classification method for classifying Chinese text based on Rocchio algorithm. We firstly use the TFIDF to extract document vectors from the training documents which have been correctly categorized, and then use those document vectors to generate codebooks as classification models using the LBG and Rocchio algorithm. The codebook is then used to categorize the target documents using vector scores. We tested this method in the experiment and the result shows that this method can achieve better performance.
Keywords :
natural language processing; text analysis; vectors; Chinese text; LBG; TFIDF; codebooks; improved Rocchio algorithm; target documents; text categorization; text classification method; text document; training documents; vector scores; Algorithm design and analysis; Classification algorithms; Computational modeling; Support vector machine classification; Text categorization; Training data; Vectors; LBG; Rocchio Algorithm; TFIDF; Text Categorization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Systems and Informatics (ICSAI), 2012 International Conference on
Conference_Location :
Yantai
Print_ISBN :
978-1-4673-0198-5
Type :
conf
DOI :
10.1109/ICSAI.2012.6223499
Filename :
6223499
Link To Document :
بازگشت