DocumentCode
3530906
Title
Topic detection in noisy data sources
Author
Denecke, Kerstin ; Brosowski, Marko
Author_Institution
L3S Res. Center, Hannover, Germany
fYear
2010
fDate
5-8 July 2010
Firstpage
50
Lastpage
55
Abstract
Automatic topic detection becomes more important due to the increase of information electronically available and the necessity to process and filter it. In particular, when language is noisy like in weblog postings, it is challenging to determine topics correctly. Nevertheless, it is still unclear, to what extent existing topic detection algorithms are able to deal with this noisy material. In this paper, Latent Dirichlet Allocation (LDA) is exploited to determine topics in weblog sentences. We perform an extensive evaluation of this algorithm on real world data of different domains. The results show that LDA can successfully determine topics even for short and noisy sentences.
Keywords
Web sites; information filtering; Weblog sentence; automatic topic detection; information filtering; latent Dirichlet allocation; noisy data sources; Accuracy; Blogs; Context; Correlation; Noise measurement; Pediatrics; Software;
fLanguage
English
Publisher
ieee
Conference_Titel
Digital Information Management (ICDIM), 2010 Fifth International Conference on
Conference_Location
Thunder Bay, ON
Print_ISBN
978-1-4244-7572-8
Type
conf
DOI
10.1109/ICDIM.2010.5664202
Filename
5664202
Link To Document