DocumentCode :
595484
Title :
An active learning approach to frequent itemset-based text clustering
Author :
Marcacini, Ricardo M. ; Correa, G.N. ; Rezende, Solange O.
Author_Institution :
Math. & Comput. Sci. Inst., Univ. of Sao Paulo, Sao Carlos, Brazil
fYear :
2012
fDate :
11-15 Nov. 2012
Firstpage :
3529
Lastpage :
3532
Abstract :
Frequent itemset-based text clustering has emerged as a promising way to automatic organization of text documents, because it allows high clustering accuracy combined with understandable cluster descriptors. However, the clustering results may not be satisfactory because they do not reflect the user´s point of view. In this context, active learning is an interesting approach to incorporate the user´s knowledge in the text clustering task by querying the users about the data. We introduce an active learning approach to frequent itemset-based text clustering called AL2FIC. In our approach, the users can provide feedback directly on the cluster descriptors without the need to know the document labels. An experimental evaluation on real text collections demonstrated that our AL2FIC approach significantly increases the text clustering performance even when only few descriptors are selected by the users.
Keywords :
learning (artificial intelligence); pattern clustering; query processing; text analysis; AL2FIC; active learning approach; automatic organization; cluster descriptors; document labels; frequent itemset-based text clustering; text documents; user knowledge; Clustering algorithms; Data mining; Equations; Itemsets; Mathematical model; Power capacitors; Vectors;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Pattern Recognition (ICPR), 2012 21st International Conference on
Conference_Location :
Tsukuba
ISSN :
1051-4651
Print_ISBN :
978-1-4673-2216-4
Type :
conf
Filename :
6460926
Link To Document :
بازگشت