DocumentCode :
2788957
Title :
Summarization- and learning-based approaches to information distillation
Author :
Toth, Boriska ; Hakkani-Tür, Dilek ; Yaman, Sibel
Author_Institution :
Univ. of California, Berkeley, CA, USA
fYear :
2010
fDate :
14-19 March 2010
Firstpage :
5306
Lastpage :
5309
Abstract :
Information distillation is the task that aims to extract relevant passages of text from massive volumes of textual and audio sources, given a query. In this paper, we investigate two perspectives that use shallow language processing for answering open-ended distillation queries, such as “List me facts about [event]”. The first approach is a summarization-based approach that uses the unsupervised maximum marginal relevance (MMR) technique to successfully capture relevant but not redundant information. The second approach is based on supervised classification and trains support vector machines (SVMs) to discriminate relevant snippets from irrelevant snippets using a variety of features. Furthermore, we investigate the merit of using the ROUGE metric for its ability to evaluate redundancy alongside the conventionally used F-measure for evaluating distillation systems. Our experimental results with textual data indicate that SVM and MMR perform similarly in terms of ROUGE-2 scores while SVM is better than MMR in terms of F1 measure. Moreover, when speech recognizer output is used, SVM outperforms MMR in terms of both scores.
Keywords :
audio databases; information retrieval systems; natural language processing; pattern classification; query processing; support vector machines; text analysis; F-measure; MMR technique; ROUGE metric; SVM training; audio sources; information distillation learning based approach; information distillation summarization approach; open ended distillation queries; relevant text extraction; shallow language processing; speech recognizer output; supervised classification; support vector machines; textual sources; unsupervised maximum marginal relevance; Automatic speech recognition; Computer science; Data mining; Humans; Nominations and elections; Performance evaluation; Speech processing; Speech recognition; Supervised learning; Support vector machines; information distillation; information extraction; speech processing; summarization; supervised learning;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on
Conference_Location :
Dallas, TX
ISSN :
1520-6149
Print_ISBN :
978-1-4244-4295-9
Electronic_ISBN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2010.5494971
Filename :
5494971
Link To Document :
بازگشت