Author :
Suendermann, D. ; Liscombe, J. ; Evanini, K. ; Dayanidhi, K. ; Pieraccini, R.
Author_Institution :
SpeechCycle, Inc., New York, NY
Abstract :
The annotation of hundreds of thousands of utterances for the training of statistical utterance classifiers requires a careful quality assurance procedure to make the data consistent and reliable. In this paper, we present five methods to analyze different aspects of annotated data to ensure their Completeness, Consistency, Correlation, Congruence and to avoid Confusion-collectively referred to as C5.
Keywords :
computational linguistics; natural language processing; quality assurance; speech processing; statistical analysis; data consistency; natural language; quality assurance procedure; reliable data; statistical utterance classifier; utterance annotation; Automatic speech recognition; Cities and towns; Logic; Mood; Natural languages; Quality assurance; Speech processing; Speech recognition; Training data; Vocabulary; annotation; quality assurance; statistical utterance classification;
Conference_Titel :
Spoken Language Technology Workshop, 2008. SLT 2008. IEEE
Conference_Location :
Goa
Print_ISBN :
978-1-4244-3471-8
Electronic_ISBN :
978-1-4244-3472-5
DOI :
10.1109/SLT.2008.4777856