Title :
An Incremental Weibo-Oriented Method for Unknown Word and Topic Extraction
Author :
Qianren Liu ; Lei Wang
Author_Institution :
Sch. of Inf. & Commun. Eng., Beijing Univ. of Posts & Telecommun., Beijing, China
Abstract :
Due to the great flexibility in wording and highly correlation between unknown words and unpredictable topics, which are exhibited in Chinese twitter (i.e. weibo) messages, it proposed a weibo-oriented method to detect unknown words and topics simultaneously. The method is efficient and precise, however, because of the adopted classical K-means algorithm, it cannot deal with weibo corpus with increasing size. In this paper, a modified version is presented to eliminate the limitation by incorporating incremental clustering mechanism. Based on the weibo characteristics, a new similarity measure is introduced, which takes into considerations the relevance of the recently identified unknown words. With this measure, the categories evolution is reasonably carried out in the process of incremental clustering to ensure the balance between the consistency and timeliness of the method. Experiments show that unknown words and topics can be effectively detected and tracked.
Keywords :
information retrieval; learning (artificial intelligence); pattern clustering; social networking (online); Chinese twitter message; K-means algorithm; incremental Weibo-oriented method; incremental clustering mechanism; similarity measure; topic extraction; unknown word extraction; Clustering algorithms; Correlation; Data mining; Dictionaries; Educational institutions; Semantics; Telecommunications; Categories merge; Cosine similarity; Incremental k-means; Unknown words extraction;
Conference_Titel :
Semantics, Knowledge and Grids (SKG), 2013 Ninth International Conference on
Conference_Location :
Beijing
DOI :
10.1109/SKG.2013.13