DocumentCode :
2774702
Title :
Automatic Keyphrase Extraction from Bengali Documents: A Preliminary Study
Author :
Sarkar, Kamal
Author_Institution :
Comput. Sci. & Eng. Dept., Jadavpur Univ., Kolkata, India
fYear :
2011
fDate :
19-20 Feb. 2011
Firstpage :
125
Lastpage :
128
Abstract :
Key phrases are sequence of words that capture the main topics covered in a document. The key phrases help readers rapidly understand, organize, access and share information of a document. In this paper, we present a preliminary study on key phrase extraction from Bengali documents using two important features, such as TF*IDF, phrase´s first occurrence in the text. For this study, we design a prototype system which works as follows: extracts n-grams from a source article, identifies candidate key phrases, and finally ranks the candidate key phrases to select the desired number of key phrases. The system has been tested on a collection of Bengali documents selected from a Bengali corpus downloadable from TDIL website and the preliminary results on Bengali key phrase extraction have been reported in this paper.
Keywords :
document handling; information retrieval; word processing; Bengali document; Bengali key phrase extraction; TDIL Website; automatic keyphrase extraction; document information sharing; key phrase extraction; n-gram extraction; Computer science; Data mining; Feature extraction; Information retrieval; Thesauri; Training; Bengali keyphrase extraction; Information Retrieval;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Emerging Applications of Information Technology (EAIT), 2011 Second International Conference on
Conference_Location :
Kolkata
Print_ISBN :
978-1-4244-9683-9
Type :
conf
DOI :
10.1109/EAIT.2011.35
Filename :
5734932
Link To Document :
بازگشت