DocumentCode
469317
Title
Tamil Document Summarization Using Semantic Graph Method
Author
Banu, Mihai ; Karthika, C. ; Sudarmani, P. ; Geetha, T.V.
Author_Institution
Anna Univ., Chennai
Volume
2
fYear
2007
fDate
13-15 Dec. 2007
Firstpage
128
Lastpage
134
Abstract
Document summarization refers to the task of producing shorter version of the original document by selecting important sentences from the text. Tamil Document Summarization using sub graph presents a method for extracting sentences from an individual document to serve as a document summary or a pre-cursor to creating a generic document abstract. Language-Neutral Syntax (LNS), a system of representation for natural language sentences has been used for considering the semantics of the documents. Syntactic analysis of the text that produces a logical form analysis has been applied for each sentence. Subject-Object-Predicate (SOP) triples are extracted from individual sentences to create a semantic graph [2] of the original document and the corresponding human extracted summary. Semantic Normalization is applied to SOP triples to reduce the number of nodes in the semantic graph of the original document. Using the Support Vector Machine (SVM) learning algorithm, a classifier has been trained to identify SOP triples from the document semantic graph that belong to the summary. The classifier is then used for automatic extraction of summaries from the test documents.
Keywords
abstracting; classification; computational linguistics; graph theory; learning (artificial intelligence); natural language processing; support vector machines; text analysis; Tamil document summarization; classifier training; document automatic summary extraction; document semantic graph method; language-neutral syntax; logical form analysis; natural language sentence representation system; semantic normalization; subject-object-predicate triple; support vector machine learning algorithm; text syntactic analysis; Application software; Computational intelligence; Computer science; Data mining; Databases; Guidelines; Information analysis; Natural languages; Support vector machine classification; Support vector machines;
fLanguage
English
Publisher
ieee
Conference_Titel
Conference on Computational Intelligence and Multimedia Applications, 2007. International Conference on
Conference_Location
Sivakasi, Tamil Nadu
Print_ISBN
0-7695-3050-8
Type
conf
DOI
10.1109/ICCIMA.2007.247
Filename
4426682
Link To Document