Title :
Integrating several annotation layers for statistical information distillation
Author :
Levit, Michael ; Hakkani-Tür, Dilek ; Tur, Gokhan ; Gillick, Daniel
Author_Institution :
Int. Comput. Sci. Inst., Berkeley
Abstract :
We present a sentence extraction algorithm for Information Distillation, a task where for a given templated query, relevant passages must be extracted from massive audio and textual document sources. For each sentence of the relevant documents (that are assumed to be known from the upstream stages) we employ statistical classification methods to estimate the extent of its relevance to the query, whereby two aspects of relevance are taken into account: the template (type) of the query and its slots (free-text descriptions of names, organizations, topic, events and so on, around which templates are centered). The idiosyncrasy of the presented method is in the choice of features used for classification. We extract our features from charts, compilations of elements from various annotation levels, such as word transcriptions, syntactic and semantic parses, and Information Extraction annotations. In our experiments we show that this integrated approach outperforms a purely lexical baseline by as much as 30% relative in terms of F-measure. We also investigate the algorithm´s behavior under noisy conditions, by comparing its performance on ASR output and on corresponding manual transcriptions.
Keywords :
feature extraction; natural language processing; pattern classification; pattern clustering; query processing; statistical analysis; text analysis; feature extraction; information extraction annotation; massive audio; sentence extraction algorithm; statistical classification method; statistical information distillation; templated query; textual document; word clusters; Automatic speech recognition; Birds; Computer science; Data mining; Influenza; Machine learning; Natural language processing; Natural languages; Runtime; Speech processing; Information Distillation; Machine Learning; Question Answering; Statistical Natural Language Processing;
Conference_Titel :
Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on
Conference_Location :
Kyoto
Print_ISBN :
978-1-4244-1746-9
Electronic_ISBN :
978-1-4244-1746-9
DOI :
10.1109/ASRU.2007.4430192