• DocumentCode
    595484
  • Title

    An active learning approach to frequent itemset-based text clustering

  • Author

    Marcacini, Ricardo M. ; Correa, G.N. ; Rezende, Solange O.

  • Author_Institution
    Math. & Comput. Sci. Inst., Univ. of Sao Paulo, Sao Carlos, Brazil
  • fYear
    2012
  • fDate
    11-15 Nov. 2012
  • Firstpage
    3529
  • Lastpage
    3532
  • Abstract
    Frequent itemset-based text clustering has emerged as a promising way to automatic organization of text documents, because it allows high clustering accuracy combined with understandable cluster descriptors. However, the clustering results may not be satisfactory because they do not reflect the user´s point of view. In this context, active learning is an interesting approach to incorporate the user´s knowledge in the text clustering task by querying the users about the data. We introduce an active learning approach to frequent itemset-based text clustering called AL2FIC. In our approach, the users can provide feedback directly on the cluster descriptors without the need to know the document labels. An experimental evaluation on real text collections demonstrated that our AL2FIC approach significantly increases the text clustering performance even when only few descriptors are selected by the users.
  • Keywords
    learning (artificial intelligence); pattern clustering; query processing; text analysis; AL2FIC; active learning approach; automatic organization; cluster descriptors; document labels; frequent itemset-based text clustering; text documents; user knowledge; Clustering algorithms; Data mining; Equations; Itemsets; Mathematical model; Power capacitors; Vectors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Pattern Recognition (ICPR), 2012 21st International Conference on
  • Conference_Location
    Tsukuba
  • ISSN
    1051-4651
  • Print_ISBN
    978-1-4673-2216-4
  • Type

    conf

  • Filename
    6460926