Title :
Punctuating speech for information extraction
Author :
Favre, Benoit ; Grishman, Ralph ; Hillard, Dustin ; JI, Heng ; Hakkani-Tür, Dilek ; Ostendorf, Mari
Author_Institution :
ICSI, Berkeley, CA
fDate :
March 31 2008-April 4 2008
Abstract :
This paper studies the effect of automatic sentence boundary detection and comma prediction on entity and relation extraction in speech. We show that punctuating the machine generated transcript according to maximum F-measure of period and comma annotation results in suboptimal information extraction. Precisely, period and comma decision thresholds can be chosen in order to improve the entity value score and the relation value score by 4% relative. Error analysis shows that preventing noun-phrase splitting by generating longer sentences and fewer commas can be harmful for IE performance. Indeed, it seems that missed punctuation allows syntactic parsers to merge noun-phrases and prevent the extraction of correct information.
Keywords :
grammars; information retrieval; natural language processing; speech recognition; automatic sentence boundary detection; comma prediction; error analysis; information extraction; machine generated transcript; maximum F-measure; natural language processing; noun-phrase splitting; speech entity extraction; speech punctuation; speech relation extraction; syntactic parser; Automatic speech recognition; Broadcasting; Data mining; Error analysis; Natural language processing; Natural languages; Predictive models; Speech processing; Speech recognition; Tagging; Extraction; Information; Punctuation Prediction; Speech;
Conference_Titel :
Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on
Conference_Location :
Las Vegas, NV
Print_ISBN :
978-1-4244-1483-3
Electronic_ISBN :
1520-6149
DOI :
10.1109/ICASSP.2008.4518784