• DocumentCode
    246990
  • Title

    Topic Detection in Chinese Microblogs Using Hot Term Discovery and Adaptive Spectral Clustering

  • Author

    Chengxu Ye ; Ping Yang ; Shaopeng Liu

  • Author_Institution
    Qinghai Normal Univ., Xining, China
  • fYear
    2014
  • fDate
    8-10 Nov. 2014
  • Firstpage
    110
  • Lastpage
    119
  • Abstract
    Weibo is a popular Chinese microblogging service that counts with millions of users and allows them to share short text messages. As an information network, Weibo can tell people what they care about as it is happening in the society. Unfortunately, users are constantly struggling to keep up with the larger and larger amounts of messages published every day. In order to help users to get the big picture, an efficient and effective topic detection method is urgent in demand. Considering the sheer scale and rapid evolution of the microblog messages, we investigate a novel method for topic detection in Chinese Microblogs in a given time period. It is composed of two major steps. First, hot terms are extracted by a suffix array structure and a TF*SDF term weighting scheme. Second, based on the extracted hot terms, we calculate their co-occurrence information and then group the terms into clusters that represent topics using an adaptive spectral clustering. Extensive experimental results on real world data demonstrate that the proposed method is more effective and efficient for topic detection in Chinese microblogs than existing approaches.
  • Keywords
    Web sites; electronic messaging; information networks; information retrieval; natural language processing; pattern clustering; Chinese microblogging service; TF*SDF term weighting scheme; Weibo; adaptive spectral clustering; co-occurrence information; hot term discovery; information network; microblog messages; suffix array structure; text message; topic detection; Adaptation models; Arrays; Clustering algorithms; Data mining; Educational institutions; Real-time systems; Time-frequency analysis; adaptive spectral clustering; hot term discovery; microblog; topic detection;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC), 2014 Ninth International Conference on
  • Conference_Location
    Guangdong
  • Type

    conf

  • DOI
    10.1109/3PGCIC.2014.44
  • Filename
    7024566