• DocumentCode
    3752235
  • Title

    A framework of human-based speech transcription with a speech chunking front-end

  • Author

    Takashi Saito

  • Author_Institution
    Shonan Institute of Technology, Kanagawa, Japan
  • fYear
    2015
  • Firstpage
    125
  • Lastpage
    128
  • Abstract
    This paper presents a framework of "human-based" speech transcription in a crowdsourcing environment. The main purpose of the framework is to promote participation of a large population of volunteers in speech transcription to create caption data for hearing-impaired people. It allows volunteer participants to join the transcription task with a very short segment of speech, called here as "speech chunk". It is realized by effectively incorporating a front-end of speech chunking prior to the main transcription task. The front-end is intended to increase the flexibility of the transcription task allocation to participants and more importantly to reduce the burden of the task itself by chopping audio data in advance into appropriate length of utterances and accordingly easing the repetitive playback operations. As an initial study, the performance of the speech chunking is investigated for various types of contents on how appropriately speech chunks are extracted as a transcription task unit. The result shows that the framework can be applied even to animation video contents that usually include dynamic sound effects.
  • Keywords
    "Speech","Speech processing","Silicon","Digital audio broadcasting","Internet","Speech recognition","Text processing"
  • Publisher
    ieee
  • Conference_Titel
    Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2015 Asia-Pacific
  • Type

    conf

  • DOI
    10.1109/APSIPA.2015.7415486
  • Filename
    7415486