• DocumentCode
    607372
  • Title

    HSPKNN: An effective and practical framework for hot topic detection of Internet news

  • Author

    Ping Lu ; Shengyu Liu ; Zhenjiang Dong ; Shengmei Luo ; Lixia Liu ; Haodi Li ; Qingcai Chen

  • Author_Institution
    ZTE Corp., Nanjing, China
  • fYear
    2012
  • fDate
    3-5 Dec. 2012
  • Firstpage
    888
  • Lastpage
    893
  • Abstract
    With the rapid growth of information on the Internet, many Single-Pass based clustering methods are used in topic detection and tracking (TDT) because of Single-Pass´s characteristics of incremental processing. In Single-Pass based methods, similarities between the feature vectors of news reports and the cluster centers of historical topics are calculated. The accuracy of TDT will be affected if the cluster centers can not precisely represent the topics. To overcome the shortcoming of Single-Pass based methods. This paper proposes an effective and practical framework for hot topic detection of Internet news. Firstly, news report streams are partitioned into segments by a time window, and then an agglomerative hierarchical clustering algorithm is used to acquire candidate topics. Finally, an algorithm fusing Single-Pass and KNN is proposed to detect topics from the candidate topics. Furthermore, in order to make it easier for the users to understand what the topics discuss, an algorithm generating descriptive labels for detected topics is proposed. Experimental results show that the proposed framework can outperform Single-Pass based methods and agglomerative hierarchical clustering based methods for TDT. In addition, the proposed framework has been used in the TDT module of an application system. Both the experimental results and application system demonstrate the effectiveness and practicality of the proposed framework.
  • Keywords
    Internet; feature extraction; information resources; pattern clustering; HSPKNN; Internet news; TDT module; agglomerative hierarchical clustering algorithm; historical topic cluster centers; hot topic detection; incremental processing; news report feature vectors; single-pass based clustering methods; single-pass based methods; single-pass characteristics; single-pass-KNN fusion algorithm; time window; topic detection and tracking; Agglomerative Hierarchical Clustering; Single-Pass; TDT;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computing and Convergence Technology (ICCCT), 2012 7th International Conference on
  • Conference_Location
    Seoul
  • Print_ISBN
    978-1-4673-0894-6
  • Type

    conf

  • Filename
    6530461