Title :
Automated Bangla text summarization by sentence scoring and ranking
Author :
Efat, Md Iftekharul Alam ; Ibrahim, Mohammad ; Kayesh, Humayun
Author_Institution :
Inst. of Inf. Technol., Univ. of Dhaka, Dhaka, Bangladesh
Abstract :
In Natural Language Processing (NLP) the document summarization is an area that is getting interest of modern researchers. Though there are many techniques that have been proposed for English language but a few notable works have been done for Bangla text summarization. This paper deals with the development of an extraction based summarization technique which works on Bangla text documents. The system summarizes a single document at a time. Before creating the summary of a document, it is pre-processed by tokenization, removal of stop words and stemming. In the document summarization process, the countable features like word frequency and sentence positional value are used to make the summary more precise and concrete. Attributes like cue words and skeleton of the document are included in the process, which help to make the summary more relevant to the content of the document. The proposed technique has been compared with summary of documents generated by human professionals. The evaluation shows that 83.57% of summary sentences selected by the system agreed with those made by human.
Keywords :
natural language processing; text analysis; Bangla text documents; English language; NLP; automated Bangla text summarization; cue words; document skeleton; document summarization process; extraction based summarization technique; natural language processing; sentence positional value; sentence ranking; sentence scoring; stemming; stop word removal; tokenization; word frequency; Accuracy; Educational institutions; Feature extraction; Frequency measurement; Information technology; Natural language processing; Skeleton;
Conference_Titel :
Informatics, Electronics & Vision (ICIEV), 2013 International Conference on
Conference_Location :
Dhaka
Print_ISBN :
978-1-4799-0397-9
DOI :
10.1109/ICIEV.2013.6572686