• DocumentCode
    3011663
  • Title

    An improved keyword extraction method using graph based random walk model

  • Author

    Islam, Md Rafiqul ; Islam, Md Rafiqul

  • Author_Institution
    Dept. of Comput. Sci. & Eng. Discipline, Khulna Univ., Khulna
  • fYear
    2008
  • fDate
    24-27 Dec. 2008
  • Firstpage
    225
  • Lastpage
    229
  • Abstract
    Keywords can be considered as condensed versions of documents, which can play important role in some text processing tasks such as text indexing, summarization and categorization. However, there are many digital documents especially on the Internet that do not have a list of assigned keywords. Assigning keywords to these documents manually is a difficult task and requires appropriate knowledge of the topic. Automatic keyword extraction process can solve this problem. In this paper, we introduce a new improved method for keyword extraction using random walk model by considering position of terms within the document and information gain of terms corresponds to the whole set of documents. We also incorporate mutual information (MI) of terms with random walk model to extract keywords from documents. The experiments on standard test collections show that our method outperforms the previously proposed methods.
  • Keywords
    graph theory; information retrieval; random processes; text analysis; digital document; document keyword assignment; graph-based random walk model; improved keyword extraction method; mutual information; text processing task; text summary; Casting; Citation analysis; Computer science; Data mining; Indexing; Information technology; Internet; Mutual information; Text processing; Voting; Keyword extraction; information gain; mutual information; random walk model; term position;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer and Information Technology, 2008. ICCIT 2008. 11th International Conference on
  • Conference_Location
    Khulna
  • Print_ISBN
    978-1-4244-2135-0
  • Electronic_ISBN
    978-1-4244-2136-7
  • Type

    conf

  • DOI
    10.1109/ICCITECHN.2008.4802967
  • Filename
    4802967