Automatic Keyphrase Extraction from Bengali Documents: A Preliminary Study

Author

Sarkar, Kamal

Author_Institution

Comput. Sci. & Eng. Dept., Jadavpur Univ., Kolkata, India

fYear

2011

fDate

19-20 Feb. 2011

Firstpage

125

Lastpage

128

Abstract

Key phrases are sequence of words that capture the main topics covered in a document. The key phrases help readers rapidly understand, organize, access and share information of a document. In this paper, we present a preliminary study on key phrase extraction from Bengali documents using two important features, such as TF*IDF, phrase´s first occurrence in the text. For this study, we design a prototype system which works as follows: extracts n-grams from a source article, identifies candidate key phrases, and finally ranks the candidate key phrases to select the desired number of key phrases. The system has been tested on a collection of Bengali documents selected from a Bengali corpus downloadable from TDIL website and the preliminary results on Bengali key phrase extraction have been reported in this paper.

Keywords

document handling; information retrieval; word processing; Bengali document; Bengali key phrase extraction; TDIL Website; automatic keyphrase extraction; document information sharing; key phrase extraction; n-gram extraction; Computer science; Data mining; Feature extraction; Information retrieval; Thesauri; Training; Bengali keyphrase extraction; Information Retrieval;

fLanguage

English

Publisher

ieee

Conference_Titel

Emerging Applications of Information Technology (EAIT), 2011 Second International Conference on

Conference_Location

Kolkata

Print_ISBN

978-1-4244-9683-9

Type

conf

DOI

10.1109/EAIT.2011.35

Filename

5734932