Title :
Automating keyphrase extraction with multi-objective genetic algorithms
Author :
Wu, Jia-Long ; Agogino, Alice M.
Author_Institution :
California Univ., Berkeley, CA, USA
Abstract :
Keyphrases have been used extensively in IR systems to facilitate information exchange, organize information and assist information retrieval. Automation of keyphrase generation is essential for the timely creation of keyphrases for large repositories in new domains where previous thesauri do not exist or for metacollections in which keyphrases that are meaningful across disparate collections are needed. In this paper we propose an automated keyphrase extraction algorithm using a non-dominated sorting multi-objective genetic algorithm. The "clumping" property of keyphrases is used to judge the appropriateness of a phrase and is quantified by a condensation clustering measure proposed by Bookstein. The objective is to find the smallest phrase set that has the best precision, as measured by average condensation clustering. Keyphrases were retrieved from a collection of design conference papers and the results were presented to domain experts for evaluation. Ninety percent of the generated phrases were deemed appropriate for use in a thesaurus for engineering design.
Keywords :
genetic algorithms; information retrieval systems; automated keyphrase extraction; information retrieval system; multiobjective genetic algorithm; Automation; Data mining; Feedback; Genetic algorithms; Indexing; Information retrieval; Natural languages; Thesauri; Unified modeling language; Vocabulary;
Conference_Titel :
System Sciences, 2004. Proceedings of the 37th Annual Hawaii International Conference on
Print_ISBN :
0-7695-2056-1
DOI :
10.1109/HICSS.2004.1265278