DocumentCode
3752235
Title
A framework of human-based speech transcription with a speech chunking front-end
Author
Takashi Saito
Author_Institution
Shonan Institute of Technology, Kanagawa, Japan
fYear
2015
Firstpage
125
Lastpage
128
Abstract
This paper presents a framework of "human-based" speech transcription in a crowdsourcing environment. The main purpose of the framework is to promote participation of a large population of volunteers in speech transcription to create caption data for hearing-impaired people. It allows volunteer participants to join the transcription task with a very short segment of speech, called here as "speech chunk". It is realized by effectively incorporating a front-end of speech chunking prior to the main transcription task. The front-end is intended to increase the flexibility of the transcription task allocation to participants and more importantly to reduce the burden of the task itself by chopping audio data in advance into appropriate length of utterances and accordingly easing the repetitive playback operations. As an initial study, the performance of the speech chunking is investigated for various types of contents on how appropriately speech chunks are extracted as a transcription task unit. The result shows that the framework can be applied even to animation video contents that usually include dynamic sound effects.
Keywords
"Speech","Speech processing","Silicon","Digital audio broadcasting","Internet","Speech recognition","Text processing"
Publisher
ieee
Conference_Titel
Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2015 Asia-Pacific
Type
conf
DOI
10.1109/APSIPA.2015.7415486
Filename
7415486
Link To Document