• DocumentCode
    2717943
  • Title

    Document cluster detection on latent projections

  • Author

    Medina, Dora Alvarez ; Silva, Hugo Hidalgo

  • Author_Institution
    Univ. Politec. de Baja California, Mexicali, Mexico
  • fYear
    2009
  • fDate
    1-4 Nov. 2009
  • Firstpage
    1
  • Lastpage
    7
  • Abstract
    Probabilistic text data modeling is usually considered with Bernoulli or multinomial event models. The main problem of text mining is the large amount of zero account in the matrix representation. Recently a document visualization technique incorporating the Zero Inflated Poisson model in the Generative Topographic Mapping algorithm has been proposed. This probabilistic model can be applied as a text document visualization tool. In this work, an algorithm for automatically extracting the clusters in the visualization results is presented. The combination of visualization-cluster extraction algorithms allows to obtain and evaluate document collections. Several results are presented for 20-Newsgroups and Reuters data.
  • Keywords
    data models; data visualisation; pattern clustering; probability; stochastic processes; text analysis; cluster extraction; document cluster detection; document collection; generative topographic mapping algorithm; latent projection; matrix representation; probabilistic text data modeling; text document visualization; zero inflated Poisson model; Clustering algorithms; Data mining; Data visualization; Text mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Digital Information Management, 2009. ICDIM 2009. Fourth International Conference on
  • Conference_Location
    Ann Arbor, MI
  • Print_ISBN
    978-1-4244-4253-9
  • Electronic_ISBN
    978-1-4244-4254-6
  • Type

    conf

  • DOI
    10.1109/ICDIM.2009.5356765
  • Filename
    5356765