Title :
Statistical Sentence Extraction for Information Distillation
Author :
Hakkani-Tur, Dilek ; Tur, Gokhan
Author_Institution :
Int. Comput. Sci. Inst., Berkeley, CA, USA
Abstract :
Information distillation aims to extract the most useful pieces of information related to a given query from massive, possibly multilingual, audio and textual document sources. One critical component in a distillation engine is detecting sentences to be extracted from each relevant document. In this paper, we present a statistical sentence extraction approach for distillation. Basically, we frame this tack as a classification problem, where each candidate sentence in documents is classified as a relevant to the query or not. These documents may be textual or audio format and in a number of languages. For audio documents, we use both manual and automatic transcriptions, for non-English documents, we use automatic translations. In this work, we use AdaBoost, a discriminative classification method with both lexical and semantic features. The results indicate 11%-13% relative improvement over a baseline keyword-spotting-based approach. We also show the robustness of our method on the audio subset of the document sources using manual and automatic transcriptions.
Keywords :
document handling; feature extraction; natural language processing; speech processing; AdaBoost; audio format; automatic transcriptions; baseline keyword-spotting-based approach; classification problem; discriminative classification method; document sources; information distillation; lexical features; manual transcriptions; non-English documents; semantic features; statistical sentence extraction; textual format; Biographies; Computer science; Data mining; Information retrieval; Natural language processing; Natural languages; Robustness; Search engines; Speech processing; Strips; information distillation; information extraction; language understanding; natural language processing; speech processing;
Conference_Titel :
Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on
Conference_Location :
Honolulu, HI
Print_ISBN :
1-4244-0727-3
DOI :
10.1109/ICASSP.2007.367148