DocumentCode :
2659739
Title :
C5
Author :
Suendermann, D. ; Liscombe, J. ; Evanini, K. ; Dayanidhi, K. ; Pieraccini, R.
Author_Institution :
SpeechCycle, Inc., New York, NY
fYear :
2008
fDate :
15-19 Dec. 2008
Firstpage :
125
Lastpage :
128
Abstract :
The annotation of hundreds of thousands of utterances for the training of statistical utterance classifiers requires a careful quality assurance procedure to make the data consistent and reliable. In this paper, we present five methods to analyze different aspects of annotated data to ensure their Completeness, Consistency, Correlation, Congruence and to avoid Confusion-collectively referred to as C5.
Keywords :
computational linguistics; natural language processing; quality assurance; speech processing; statistical analysis; data consistency; natural language; quality assurance procedure; reliable data; statistical utterance classifier; utterance annotation; Automatic speech recognition; Cities and towns; Logic; Mood; Natural languages; Quality assurance; Speech processing; Speech recognition; Training data; Vocabulary; annotation; quality assurance; statistical utterance classification;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Spoken Language Technology Workshop, 2008. SLT 2008. IEEE
Conference_Location :
Goa
Print_ISBN :
978-1-4244-3471-8
Electronic_ISBN :
978-1-4244-3472-5
Type :
conf
DOI :
10.1109/SLT.2008.4777856
Filename :
4777856
Link To Document :
بازگشت