DocumentCode
595484
Title
An active learning approach to frequent itemset-based text clustering
Author
Marcacini, Ricardo M. ; Correa, G.N. ; Rezende, Solange O.
Author_Institution
Math. & Comput. Sci. Inst., Univ. of Sao Paulo, Sao Carlos, Brazil
fYear
2012
fDate
11-15 Nov. 2012
Firstpage
3529
Lastpage
3532
Abstract
Frequent itemset-based text clustering has emerged as a promising way to automatic organization of text documents, because it allows high clustering accuracy combined with understandable cluster descriptors. However, the clustering results may not be satisfactory because they do not reflect the user´s point of view. In this context, active learning is an interesting approach to incorporate the user´s knowledge in the text clustering task by querying the users about the data. We introduce an active learning approach to frequent itemset-based text clustering called AL2FIC. In our approach, the users can provide feedback directly on the cluster descriptors without the need to know the document labels. An experimental evaluation on real text collections demonstrated that our AL2FIC approach significantly increases the text clustering performance even when only few descriptors are selected by the users.
Keywords
learning (artificial intelligence); pattern clustering; query processing; text analysis; AL2FIC; active learning approach; automatic organization; cluster descriptors; document labels; frequent itemset-based text clustering; text documents; user knowledge; Clustering algorithms; Data mining; Equations; Itemsets; Mathematical model; Power capacitors; Vectors;
fLanguage
English
Publisher
ieee
Conference_Titel
Pattern Recognition (ICPR), 2012 21st International Conference on
Conference_Location
Tsukuba
ISSN
1051-4651
Print_ISBN
978-1-4673-2216-4
Type
conf
Filename
6460926
Link To Document