• DocumentCode
    3530906
  • Title

    Topic detection in noisy data sources

  • Author

    Denecke, Kerstin ; Brosowski, Marko

  • Author_Institution
    L3S Res. Center, Hannover, Germany
  • fYear
    2010
  • fDate
    5-8 July 2010
  • Firstpage
    50
  • Lastpage
    55
  • Abstract
    Automatic topic detection becomes more important due to the increase of information electronically available and the necessity to process and filter it. In particular, when language is noisy like in weblog postings, it is challenging to determine topics correctly. Nevertheless, it is still unclear, to what extent existing topic detection algorithms are able to deal with this noisy material. In this paper, Latent Dirichlet Allocation (LDA) is exploited to determine topics in weblog sentences. We perform an extensive evaluation of this algorithm on real world data of different domains. The results show that LDA can successfully determine topics even for short and noisy sentences.
  • Keywords
    Web sites; information filtering; Weblog sentence; automatic topic detection; information filtering; latent Dirichlet allocation; noisy data sources; Accuracy; Blogs; Context; Correlation; Noise measurement; Pediatrics; Software;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Digital Information Management (ICDIM), 2010 Fifth International Conference on
  • Conference_Location
    Thunder Bay, ON
  • Print_ISBN
    978-1-4244-7572-8
  • Type

    conf

  • DOI
    10.1109/ICDIM.2010.5664202
  • Filename
    5664202