Title :
Improving keyphrase extraction by using document topic information
Author :
Mishra, Anirvana ; Singh, Gaurav
Author_Institution :
Dept. of Comput. Eng., Delhi Technol. Univ., New Delhi, India
Abstract :
The objective of automatic keyphrase extraction is to generate keyphrases for large number of documents. A weakness of earlier keyphrase extraction algorithms is that occasionally they have lesser coherence among the extracted keyphrases. This paper examines the effect of injecting the domain information of the document to the ranking phase of automatic keyphrase extraction. The proposed method utilizes the statistical similarity of the domain between the document and the automatically extracted keyphrases as the criteria for ranking the keyphrases. The method is evaluated on baseline as well as advanced methods like KEA and resulted in a considerable amount of growth in accuracy. To demonstrate the feasibility of this approach, a naive implementation is also provided. The method has the potential to be widely applicable in all Keyphrase extraction algorithms.
Keywords :
text analysis; word processing; automatic keyphrase extraction algorithm; document topic information; Classification algorithms; Data mining; Encyclopedias; Feature extraction; Internet; Semantics; Document; Document class; Improvement; KEA; Keyphrase Ranking; Keyphrase extraction; TF-IDF; Yahoo term extractor; coherent keyphrases;
Conference_Titel :
Granular Computing (GrC), 2011 IEEE International Conference on
Conference_Location :
Kaohsiung
Print_ISBN :
978-1-4577-0372-0
DOI :
10.1109/GRC.2011.6122641