DocumentCode :
1753718
Title :
Summarizing text by ranking text units according to shallow linguistic features
Author :
Gupta, Pankaj ; Pendluri, Vijay Shankar ; Vats, Ishant
Author_Institution :
Wipro Technol., Bangalore, India
fYear :
2011
fDate :
13-16 Feb. 2011
Firstpage :
1620
Lastpage :
1625
Abstract :
We present an approach of identifying the most prominent text/sentences using various shallow linguistic features, taking degree of connectiveness among the text units into consideration so as to minimize the poorly linked sentences in the resulting summary. As per the limitations of the current summarizing systems, the summary generated by those systems contains poorly linked sentences and are not topically salient. Thus, the paper aims at highlighting the effect of lexical chain scoring after the nouns and compound nouns are chained by searching for lexical cohesive relationships between words in the text using WordNet and using lexicographical relationships such as synonymy and hyponyms. In this paper, our algorithm ranks sentences based on the sum of the scores of the words in each sentence involving approaches like term frequencies, location of sentence in the text, cue words and phrases, word occurrences, and measuring lexical similarity(measuring chain score, word score and finally sentence score) for ranking the text units. We then identified and extracted high scored sentences and then the Vector Space approach is used to measure the relatedness/similarity between the extracted sentence and the topic words involving again the WordNet lexical database relationships to prioritise the topically related sentences. A threshold angle between the two vectors is predefined experimentally to which the ranked/scored sentences to be dropped and which the significant sentences with ranking/scores higher than threshold to be extracted. Note that the value of threshold is predetermined based on the percentage of output summary required to be generated.
Keywords :
text analysis; WordNet; human generated summary; lexical chain scoring; shallow linguistic features; summarizing systems; text summarization; text units ranking; vector space approach; Artificial neural networks; Books; Data mining; Databases; Frequency measurement; Humans; Pragmatics; Lexical Chains; Text Summarization; Vector Space Model; WordNet; lexicographical relationships; topically related;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Advanced Communication Technology (ICACT), 2011 13th International Conference on
Conference_Location :
Seoul
ISSN :
1738-9445
Print_ISBN :
978-1-4244-8830-8
Type :
conf
Filename :
5746114
Link To Document :
بازگشت