Improving keyphrase extraction by using document topic information

Author

Mishra, Anirvana ; Singh, Gaurav

Author_Institution

Dept. of Comput. Eng., Delhi Technol. Univ., New Delhi, India

fYear

2011

fDate

8-10 Nov. 2011

Firstpage

463

Lastpage

467

Abstract

The objective of automatic keyphrase extraction is to generate keyphrases for large number of documents. A weakness of earlier keyphrase extraction algorithms is that occasionally they have lesser coherence among the extracted keyphrases. This paper examines the effect of injecting the domain information of the document to the ranking phase of automatic keyphrase extraction. The proposed method utilizes the statistical similarity of the domain between the document and the automatically extracted keyphrases as the criteria for ranking the keyphrases. The method is evaluated on baseline as well as advanced methods like KEA and resulted in a considerable amount of growth in accuracy. To demonstrate the feasibility of this approach, a naive implementation is also provided. The method has the potential to be widely applicable in all Keyphrase extraction algorithms.

Keywords

text analysis; word processing; automatic keyphrase extraction algorithm; document topic information; Classification algorithms; Data mining; Encyclopedias; Feature extraction; Internet; Semantics; Document; Document class; Improvement; KEA; Keyphrase Ranking; Keyphrase extraction; TF-IDF; Yahoo term extractor; coherent keyphrases;

fLanguage

English

Publisher

ieee

Conference_Titel

Granular Computing (GrC), 2011 IEEE International Conference on

Conference_Location

Kaohsiung

Print_ISBN

978-1-4577-0372-0

Type

conf

DOI

10.1109/GRC.2011.6122641

Filename

6122641