DocumentCode
2924490
Title
Improving keyphrase extraction by using document topic information
Author
Mishra, Anirvana ; Singh, Gaurav
Author_Institution
Dept. of Comput. Eng., Delhi Technol. Univ., New Delhi, India
fYear
2011
fDate
8-10 Nov. 2011
Firstpage
463
Lastpage
467
Abstract
The objective of automatic keyphrase extraction is to generate keyphrases for large number of documents. A weakness of earlier keyphrase extraction algorithms is that occasionally they have lesser coherence among the extracted keyphrases. This paper examines the effect of injecting the domain information of the document to the ranking phase of automatic keyphrase extraction. The proposed method utilizes the statistical similarity of the domain between the document and the automatically extracted keyphrases as the criteria for ranking the keyphrases. The method is evaluated on baseline as well as advanced methods like KEA and resulted in a considerable amount of growth in accuracy. To demonstrate the feasibility of this approach, a naive implementation is also provided. The method has the potential to be widely applicable in all Keyphrase extraction algorithms.
Keywords
text analysis; word processing; automatic keyphrase extraction algorithm; document topic information; Classification algorithms; Data mining; Encyclopedias; Feature extraction; Internet; Semantics; Document; Document class; Improvement; KEA; Keyphrase Ranking; Keyphrase extraction; TF-IDF; Yahoo term extractor; coherent keyphrases;
fLanguage
English
Publisher
ieee
Conference_Titel
Granular Computing (GrC), 2011 IEEE International Conference on
Conference_Location
Kaohsiung
Print_ISBN
978-1-4577-0372-0
Type
conf
DOI
10.1109/GRC.2011.6122641
Filename
6122641
Link To Document