• DocumentCode
    2646364
  • Title

    A frequent keyword-set based algorithm for topic modeling and clustering of research papers

  • Author

    Shubankar, Kumar ; Singh, AdityaPratap ; Pudi, Vikram

  • Author_Institution
    Centre for Data Eng., IIIT Hyderabad, Hyderabad, India
  • fYear
    2011
  • fDate
    28-29 June 2011
  • Firstpage
    96
  • Lastpage
    102
  • Abstract
    In this paper we introduce a novel and efficient approach to detect topics in a large corpus of research papers. With rapidly growing size of academic literature, the problem of topic detection has become a very challenging task. We present a unique approach that uses closed frequent keyword-set to form topics. Our approach also provides a natural method to cluster the research papers into hierarchical, overlapping clusters using topic as similarity measure. To rank the research papers in the topic cluster, we devise a modified PageRank algorithm that assigns an authoritative score to each research paper by considering the sub-graph in which the research paper appears. We test our algorithms on the DBLP dataset and experimentally show that our algorithms are fast, effective and scalable.
  • Keywords
    document handling; information retrieval; pattern clustering; DBLP dataset; frequent keyword set based algorithm; modified PageRank algorithm; topic clustering; topic detection; topic modeling; Authoritative Score; Citation Network; Closed Frequent Keyword-set; Graph Mining; Topic Detection;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining and Optimization (DMO), 2011 3rd Conference on
  • Conference_Location
    Putrajaya
  • ISSN
    2155-6938
  • Print_ISBN
    978-1-61284-211-0
  • Electronic_ISBN
    2155-6938
  • Type

    conf

  • DOI
    10.1109/DMO.2011.5976511
  • Filename
    5976511