Title :
Transcribing Southern Min speech corpora with a Web-Based language learning system
Author :
Cai, Jun ; Feldmar, Jacques ; Laprie, Yves ; Fohr, Dominique ; Haton, Jean-Paul
Author_Institution :
CNRS, INRIA, Vandoeuvre-les-Nancy
Abstract :
The paper proposes a human-computation-based scheme for transcribing Southern Min speech corpora. The core idea is to implement a Web-based language learning system to collect orthographic and phonetic labels from a large amount of language learners and choose the commonly input labels as the transcriptions of the corpora. It is essentially a technology of distributed knowledge acquisition. Some computer-aided mechanisms are also used to verify the collected transcriptions. The benefit of the scheme is that it makes the transcribing task neither tedious nor costly. No significant budget should be made for transcribing large corpora. The design of a system for transcribing Min Nan speech corpora is described in detail. The application of a prototype version of the system shows that this transcribing scheme is an effective and economical way to generate orthographic and phonetic transcriptions.
Keywords :
Internet; knowledge acquisition; speech processing; Min Nan speech corpora; Southern Min speech corpora; Web-based language learning system; computer-aided mechanisms; distributed knowledge acquisition; human-computation-based scheme; orthographic labels; phonetic labels; Cognitive science; Electrostatic precipitators; Humans; Interactive systems; Knowledge acquisition; Labeling; Learning systems; Natural languages; Speech recognition; Vocabulary;
Conference_Titel :
Audio, Language and Image Processing, 2008. ICALIP 2008. International Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-1-4244-1723-0
Electronic_ISBN :
978-1-4244-1724-7
DOI :
10.1109/ICALIP.2008.4590181