Title :
Keyphrase Extraction Abstracts Instead of Full Papers
Author :
Popova, S. ; Danilova, V.
Author_Institution :
St.-Petersburg State Univ., St. Petersburg, Russia
Abstract :
In the present paper we consider keyphrase extraction problem from scientific articles. Finding an appropriate solution is important for the organization of fast navigation in databases, indexing, clustering and classification of academic papers. The base collection includes keyphrases selected by the experts for each text (SemEval2010). It is shown that the use of abstracts instead of full texts allows to improve the results obtained by processing full texts or abstracts with introduction and conclusion section. Our approach uses the extraction of keyphrases with linguistic patterns (part of speech-based), patterns are built on the basis of an auxiliary dataset. The use of abstracts in this approach allows to reduce the number of words sequences extracted with patterns, as compared to the use of full texts. It allows to simplify or totally omit the ranking stage. Ranking is usually needed, because out of many keyphrases candidates we have to choose only 10-15. This stage is the most difficult and its effectiveness depends on the number of the selected candidates to keyphrases. The use of abstracts makes it possible to considerably reduce the number of candidate phrases and at the same time yields high recall.
Keywords :
data mining; information retrieval; natural language processing; text analysis; academic paper classification; academic paper clustering; academic paper indexing; keyphrase extraction; linguistic patterns; scientific articles; word sequence extraction; Abstracts; Artificial neural networks; Data mining; Feature extraction; Gold; Pragmatics; Standards; abtract processing; indexing; informational retrieval; keyphrase extraction; keyphrase identification;
Conference_Titel :
Database and Expert Systems Applications (DEXA), 2014 25th International Workshop on
Conference_Location :
Munich
Print_ISBN :
978-1-4799-5721-7
DOI :
10.1109/DEXA.2014.57