Tamil Document Summarization Using Semantic Graph Method

Author

Banu, Mihai ; Karthika, C. ; Sudarmani, P. ; Geetha, T.V.

Author_Institution

Anna Univ., Chennai

Volume

2

fYear

2007

fDate

13-15 Dec. 2007

Firstpage

128

Lastpage

134

Abstract

Document summarization refers to the task of producing shorter version of the original document by selecting important sentences from the text. Tamil Document Summarization using sub graph presents a method for extracting sentences from an individual document to serve as a document summary or a pre-cursor to creating a generic document abstract. Language-Neutral Syntax (LNS), a system of representation for natural language sentences has been used for considering the semantics of the documents. Syntactic analysis of the text that produces a logical form analysis has been applied for each sentence. Subject-Object-Predicate (SOP) triples are extracted from individual sentences to create a semantic graph [2] of the original document and the corresponding human extracted summary. Semantic Normalization is applied to SOP triples to reduce the number of nodes in the semantic graph of the original document. Using the Support Vector Machine (SVM) learning algorithm, a classifier has been trained to identify SOP triples from the document semantic graph that belong to the summary. The classifier is then used for automatic extraction of summaries from the test documents.

Keywords

abstracting; classification; computational linguistics; graph theory; learning (artificial intelligence); natural language processing; support vector machines; text analysis; Tamil document summarization; classifier training; document automatic summary extraction; document semantic graph method; language-neutral syntax; logical form analysis; natural language sentence representation system; semantic normalization; subject-object-predicate triple; support vector machine learning algorithm; text syntactic analysis; Application software; Computational intelligence; Computer science; Data mining; Databases; Guidelines; Information analysis; Natural languages; Support vector machine classification; Support vector machines;

fLanguage

English

Publisher

ieee

Conference_Titel

Conference on Computational Intelligence and Multimedia Applications, 2007. International Conference on

Conference_Location

Sivakasi, Tamil Nadu

Print_ISBN

0-7695-3050-8

Type

conf

DOI

10.1109/ICCIMA.2007.247

Filename

4426682