DocumentCode
2734638
Title
A modified approach to keyword extraction based on word-similarity
Author
Wenchao, Meng ; Lianchen, Liu ; Ting, Dai
Author_Institution
Nat. CIMS Eng. Res. Center, Tsinghua Univ., Beijing, China
Volume
3
fYear
2009
fDate
20-22 Nov. 2009
Firstpage
388
Lastpage
392
Abstract
Two keyword-extraction ways are usually used, one is simply using the information from exactly single word like word frequency and TF.IDF, the other is based on the relationship between words. The relationship is usually described as word similarity which derives from a corpus (WordNet, HowNet) or man-made thesaurus. With the information explosion nowdays, the words we using are growing and changing rapidly. A lot of new words are not specified in man-made corpus. This paper proposes a new method to build a word similarity thesaurus. Using the semantic information from the thesaurus, together with TF.IDF and word´s first occurrence, a keyword extraction algorithm is demonstrated, the results and analysis are also given.
Keywords
information retrieval; thesauri; TF.IDF; information retrieval; keyword extraction; word similarity thesaurus; Algorithm design and analysis; Clustering algorithms; Computer integrated manufacturing; Costs; Data mining; Explosions; Frequency; Information analysis; Information retrieval; Thesauri; Jenson-Shannon divergence; Naïve Bayes; keyword extraction; word similarity;
fLanguage
English
Publisher
ieee
Conference_Titel
Intelligent Computing and Intelligent Systems, 2009. ICIS 2009. IEEE International Conference on
Conference_Location
Shanghai
Print_ISBN
978-1-4244-4754-1
Electronic_ISBN
978-1-4244-4738-1
Type
conf
DOI
10.1109/ICICISYS.2009.5358135
Filename
5358135
Link To Document