• DocumentCode
    2924490
  • Title

    Improving keyphrase extraction by using document topic information

  • Author

    Mishra, Anirvana ; Singh, Gaurav

  • Author_Institution
    Dept. of Comput. Eng., Delhi Technol. Univ., New Delhi, India
  • fYear
    2011
  • fDate
    8-10 Nov. 2011
  • Firstpage
    463
  • Lastpage
    467
  • Abstract
    The objective of automatic keyphrase extraction is to generate keyphrases for large number of documents. A weakness of earlier keyphrase extraction algorithms is that occasionally they have lesser coherence among the extracted keyphrases. This paper examines the effect of injecting the domain information of the document to the ranking phase of automatic keyphrase extraction. The proposed method utilizes the statistical similarity of the domain between the document and the automatically extracted keyphrases as the criteria for ranking the keyphrases. The method is evaluated on baseline as well as advanced methods like KEA and resulted in a considerable amount of growth in accuracy. To demonstrate the feasibility of this approach, a naive implementation is also provided. The method has the potential to be widely applicable in all Keyphrase extraction algorithms.
  • Keywords
    text analysis; word processing; automatic keyphrase extraction algorithm; document topic information; Classification algorithms; Data mining; Encyclopedias; Feature extraction; Internet; Semantics; Document; Document class; Improvement; KEA; Keyphrase Ranking; Keyphrase extraction; TF-IDF; Yahoo term extractor; coherent keyphrases;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Granular Computing (GrC), 2011 IEEE International Conference on
  • Conference_Location
    Kaohsiung
  • Print_ISBN
    978-1-4577-0372-0
  • Type

    conf

  • DOI
    10.1109/GRC.2011.6122641
  • Filename
    6122641