Title :
Unsupervised multilingual concept discovery from daily online news extracts
Author_Institution :
Nat. Taipei Univ. of Technol., Taipei, Taiwan
Abstract :
Web syndication technologies help us easily aggregate daily news from diverse sources. However, the huge amount of information makes us more difficult to read let alone digest and focus on the most important events. Therefore, we need an efficient way of news extraction and mining. In this paper, we propose an unsupervised approach to multilingual concept discovery from daily online news extracts. First, key terms are extracted statistically from short news extracts. Second, similar term candidates are grouped into concrete concepts with unsupervised term clustering methods. Our goal is automatic news processing with minimum resources, which requires no training in advance. The experimental results show the potential of the proposed approach in efficiency and effectiveness. Further investigation is needed to study the cross-lingual relation between extracted concepts.
Keywords :
Aggregates; Cellular neural networks; Clustering methods; Concrete; Data mining; Feeds; Information security; Text categorization; Text mining; Training data; Term extraction; news summarization; term clustering; text mining;
Conference_Titel :
Intelligence and Security Informatics (ISI), 2010 IEEE International Conference on
Conference_Location :
Vancouver, BC, Canada
Print_ISBN :
978-1-4244-6444-9
DOI :
10.1109/ISI.2010.5484763