DocumentCode :
3752235
Title :
A framework of human-based speech transcription with a speech chunking front-end
Author :
Takashi Saito
Author_Institution :
Shonan Institute of Technology, Kanagawa, Japan
fYear :
2015
Firstpage :
125
Lastpage :
128
Abstract :
This paper presents a framework of "human-based" speech transcription in a crowdsourcing environment. The main purpose of the framework is to promote participation of a large population of volunteers in speech transcription to create caption data for hearing-impaired people. It allows volunteer participants to join the transcription task with a very short segment of speech, called here as "speech chunk". It is realized by effectively incorporating a front-end of speech chunking prior to the main transcription task. The front-end is intended to increase the flexibility of the transcription task allocation to participants and more importantly to reduce the burden of the task itself by chopping audio data in advance into appropriate length of utterances and accordingly easing the repetitive playback operations. As an initial study, the performance of the speech chunking is investigated for various types of contents on how appropriately speech chunks are extracted as a transcription task unit. The result shows that the framework can be applied even to animation video contents that usually include dynamic sound effects.
Keywords :
"Speech","Speech processing","Silicon","Digital audio broadcasting","Internet","Speech recognition","Text processing"
Publisher :
ieee
Conference_Titel :
Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2015 Asia-Pacific
Type :
conf
DOI :
10.1109/APSIPA.2015.7415486
Filename :
7415486
Link To Document :
بازگشت