Title :
A modified approach to keyword extraction based on word-similarity
Author :
Wenchao, Meng ; Lianchen, Liu ; Ting, Dai
Author_Institution :
Nat. CIMS Eng. Res. Center, Tsinghua Univ., Beijing, China
Abstract :
Two keyword-extraction ways are usually used, one is simply using the information from exactly single word like word frequency and TF.IDF, the other is based on the relationship between words. The relationship is usually described as word similarity which derives from a corpus (WordNet, HowNet) or man-made thesaurus. With the information explosion nowdays, the words we using are growing and changing rapidly. A lot of new words are not specified in man-made corpus. This paper proposes a new method to build a word similarity thesaurus. Using the semantic information from the thesaurus, together with TF.IDF and word´s first occurrence, a keyword extraction algorithm is demonstrated, the results and analysis are also given.
Keywords :
information retrieval; thesauri; TF.IDF; information retrieval; keyword extraction; word similarity thesaurus; Algorithm design and analysis; Clustering algorithms; Computer integrated manufacturing; Costs; Data mining; Explosions; Frequency; Information analysis; Information retrieval; Thesauri; Jenson-Shannon divergence; Naïve Bayes; keyword extraction; word similarity;
Conference_Titel :
Intelligent Computing and Intelligent Systems, 2009. ICIS 2009. IEEE International Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-1-4244-4754-1
Electronic_ISBN :
978-1-4244-4738-1
DOI :
10.1109/ICICISYS.2009.5358135