• DocumentCode
    2774702
  • Title

    Automatic Keyphrase Extraction from Bengali Documents: A Preliminary Study

  • Author

    Sarkar, Kamal

  • Author_Institution
    Comput. Sci. & Eng. Dept., Jadavpur Univ., Kolkata, India
  • fYear
    2011
  • fDate
    19-20 Feb. 2011
  • Firstpage
    125
  • Lastpage
    128
  • Abstract
    Key phrases are sequence of words that capture the main topics covered in a document. The key phrases help readers rapidly understand, organize, access and share information of a document. In this paper, we present a preliminary study on key phrase extraction from Bengali documents using two important features, such as TF*IDF, phrase´s first occurrence in the text. For this study, we design a prototype system which works as follows: extracts n-grams from a source article, identifies candidate key phrases, and finally ranks the candidate key phrases to select the desired number of key phrases. The system has been tested on a collection of Bengali documents selected from a Bengali corpus downloadable from TDIL website and the preliminary results on Bengali key phrase extraction have been reported in this paper.
  • Keywords
    document handling; information retrieval; word processing; Bengali document; Bengali key phrase extraction; TDIL Website; automatic keyphrase extraction; document information sharing; key phrase extraction; n-gram extraction; Computer science; Data mining; Feature extraction; Information retrieval; Thesauri; Training; Bengali keyphrase extraction; Information Retrieval;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Emerging Applications of Information Technology (EAIT), 2011 Second International Conference on
  • Conference_Location
    Kolkata
  • Print_ISBN
    978-1-4244-9683-9
  • Type

    conf

  • DOI
    10.1109/EAIT.2011.35
  • Filename
    5734932