Title :
Opinion Summarization in Bengali: A Theme Network Model
Author :
Das, Amitava ; Bandyopadhyay, Sivaji
Author_Institution :
Dept. of Comput. Sci. & Eng., Jadavpur Univ., Kolkata, India
Abstract :
Theme network is a semantic network of document specific themes. So far Natural Language Processing (NLP) research patronized much of topic based summarizer system, unable to capture thematic semantic affinity of any text i.e. a news article containing the concepts, "gun," "convenience store," "demand money" and "make getaway" might suggest the topics "robbery" and "crime". In this paper the development of an opinion summarization system that works on Bengali News corpus has been described. The system identifies the sentiment information in each document, aggregates them and represents the summary information in text. The present system follows a topic-sentiment model for sentiment identification and aggregation. Topic-sentiment model is designed as discourse level theme identification and the topic-sentiment aggregation is achieved by theme clustering (k-means) and Document level Theme Relational Graph representation. The Document Level Theme Relational Graph is finally used for candidate summary sentence selection by standard page rank algorithms used in Information Retrieval (IR). As Bengali is a resource constraint language, the building of annotated gold standard corpus and acquisition of linguistics tools for lexico-syntactic, syntactic and discourse level features extraction are described in this paper. The reported accuracy of the Theme detection technique is 83.60% (precision), 76.44% (recall) and 79.85% (F-measure). The summarization system has been evaluated with Precision of 72.15%, Recall of 67.32% and F-measure of 69.65%.
Keywords :
information retrieval; natural language processing; semantic networks; text analysis; Bengali news corpus; NLP; discourse level feature extraction; discourse level theme identification; document level theme relational graph representation; information retrieval; lexico-syntactic level feature extraction; natural language processing; opinion summarization system; page rank algorithms; resource constraint language; semantic network; syntactic level feature extraction; thematic semantic affinity; theme clustering; theme detection technique; theme network model; topic based summarizer system; topic sentiment identification; topic-sentiment aggregation; Clustering algorithms; Feature extraction; Frequency measurement; Gold; Machine learning; Organizations; Syntactics;
Conference_Titel :
Social Computing (SocialCom), 2010 IEEE Second International Conference on
Conference_Location :
Minneapolis, MN
Print_ISBN :
978-1-4244-8439-3
Electronic_ISBN :
978-0-7695-4211-9
DOI :
10.1109/SocialCom.2010.104