Title of article :
A multi-document summarization system based on statistics and linguistic treatment
Author/Authors :
Ferreira، نويسنده , , Rafael and de Souza Cabral، نويسنده , , Luciano R. Freitas، نويسنده , , Frederico and Lins، نويسنده , , Rafael Dueire and de França Silva، نويسنده , , Gabriel and Simske، نويسنده , , Steven J. and Favaro، نويسنده , , Luciano، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2014
Pages :
8
From page :
5780
To page :
5787
Abstract :
The massive quantity of data available today in the Internet has reached such a huge volume that it has become humanly unfeasible to efficiently sieve useful information from it. One solution to this problem is offered by using text summarization techniques. Text summarization, the process of automatically creating a shorter version of one or more text documents, is an important way of finding relevant information in large text libraries or in the Internet. This paper presents a multi-document summarization system that concisely extracts the main aspects of a set of documents, trying to avoid the typical problems of this type of summarization: information redundancy and diversity. Such a purpose is achieved through a new sentence clustering algorithm based on a graph model that makes use of statistic similarities and linguistic treatment. The DUC 2002 dataset was used to assess the performance of the proposed system, surpassing DUC competitors by a 50% margin of f-measure, in the best case.
Keywords :
Extractive summarization , sentence clustering , Multi-document summarization
Journal title :
Expert Systems with Applications
Serial Year :
2014
Journal title :
Expert Systems with Applications
Record number :
2354995
Link To Document :
بازگشت