• DocumentCode
    2734638
  • Title

    A modified approach to keyword extraction based on word-similarity

  • Author

    Wenchao, Meng ; Lianchen, Liu ; Ting, Dai

  • Author_Institution
    Nat. CIMS Eng. Res. Center, Tsinghua Univ., Beijing, China
  • Volume
    3
  • fYear
    2009
  • fDate
    20-22 Nov. 2009
  • Firstpage
    388
  • Lastpage
    392
  • Abstract
    Two keyword-extraction ways are usually used, one is simply using the information from exactly single word like word frequency and TF.IDF, the other is based on the relationship between words. The relationship is usually described as word similarity which derives from a corpus (WordNet, HowNet) or man-made thesaurus. With the information explosion nowdays, the words we using are growing and changing rapidly. A lot of new words are not specified in man-made corpus. This paper proposes a new method to build a word similarity thesaurus. Using the semantic information from the thesaurus, together with TF.IDF and word´s first occurrence, a keyword extraction algorithm is demonstrated, the results and analysis are also given.
  • Keywords
    information retrieval; thesauri; TF.IDF; information retrieval; keyword extraction; word similarity thesaurus; Algorithm design and analysis; Clustering algorithms; Computer integrated manufacturing; Costs; Data mining; Explosions; Frequency; Information analysis; Information retrieval; Thesauri; Jenson-Shannon divergence; Naïve Bayes; keyword extraction; word similarity;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Computing and Intelligent Systems, 2009. ICIS 2009. IEEE International Conference on
  • Conference_Location
    Shanghai
  • Print_ISBN
    978-1-4244-4754-1
  • Electronic_ISBN
    978-1-4244-4738-1
  • Type

    conf

  • DOI
    10.1109/ICICISYS.2009.5358135
  • Filename
    5358135