DocumentCode :
1963721
Title :
An Incremental Chinese Text Classification Algorithm Based on Quick Clustering
Author :
Ma, Houfeng ; Fan, Xinghua ; Chen, Ji
Author_Institution :
Inst. of Comput. Sci. & Technol., Chongqing Univ. of Posts & Telecommun., Chongqing
fYear :
2008
fDate :
23-25 May 2008
Firstpage :
308
Lastpage :
312
Abstract :
Most conventional incremental learning algorithms perform incremental learning by selecting only one optimized text sample each time, which neither considers the relationship between texts in the unlabeled text set, nor improves incremental learning efficiency. In addition, because of the shortage of the classifierpsilas information storage, the selected optimized text is easily classified incorrectly. And the consequence of selecting wrong labeled text will reduce incremental learning precision. For overcoming these problems mentioned above, a new incremental learning algorithm based on quick clustering is proposed in this paper. On the one hand, it improves incremental learning efficiency by clustering all similar texts in unlabeled text set. All texts which are the centers of text clusters are selected as a representative text set. Then the incremental learning process is to choose texts in the representative text set under the 0-1 loss rate. On the other hand, for improving incremental learning precision, a new method for choosing reasonable learning sequence is proposed, which not only strengthen the positive impact of the more mature data on classification but also weaken the negative impact of the noisy data. The experimental results show that the classification efficiency and precision are both increased by using the algorithm.
Keywords :
classification; learning (artificial intelligence); pattern clustering; text analysis; Chinese text classification algorithm; conventional incremental learning algorithm; information storage classifier; reasonable learning sequence; text clustering; Artificial intelligence; Classification algorithms; Clustering algorithms; Computer science; Information processing; Machine learning algorithms; Noise reduction; Partitioning algorithms; Performance loss; Text categorization; Affinity propagation; Bayes; Incremental learning; Text classification; Text clustering;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Processing (ISIP), 2008 International Symposiums on
Conference_Location :
Moscow
Print_ISBN :
978-0-7695-3151-9
Type :
conf
DOI :
10.1109/ISIP.2008.126
Filename :
4554104
Link To Document :
بازگشت