• DocumentCode
    86637
  • Title

    Can we Automatically Transform Speech Recorded on Common Consumer Devices in Real-World Environments into Professional Production Quality Speech?—A Dataset, Insights, and Challenges

  • Author

    Mysore, Gautham J.

  • Author_Institution
    Adobe Res., San Francisco, CA, USA
  • Volume
    22
  • Issue
    8
  • fYear
    2015
  • fDate
    Aug. 2015
  • Firstpage
    1006
  • Lastpage
    1010
  • Abstract
    The goal of speech enhancement is typically to recover clean speech from noisy, reverberant, and often bandlimited speech in order to yield improved intelligibility, clarity, or automatic speech recognition performance. However, the acoustic goal for a great deal of speech content such as voice overs, podcasts, demo videos, lecture videos, and audio stories is often not merely clean speech, but speech that is aesthetically pleasing. This is achieved in professional recording studios by having a skilled sound engineer record clean speech in an acoustically treated room and then edit and process it with audio effects (which we refer to as production). A growing amount of speech content is being recorded on common consumer devices such as tablets, smartphones, and laptops. Moreover, it is typically recorded in common but non-acoustically treated environments such as homes and offices. We argue that the goal of enhancing such recordings should not only be to make it sound cleaner as would be done using traditional speech enhancement techniques, but to make it sound like it was recorded and produced in a professional recording studio. In this paper, we show why this can be beneficial, describe a new data set (a great deal of which was recorded in a professional recording studio) that we prepared to help in developing algorithms for this purpose, and discuss some insights and challenges associated with this problem.
  • Keywords
    audio recording; speech enhancement; speech recognition; audio effects; clean speech; professional recording studios; speech content; speech enhancement; speech recognition; Acoustics; Performance evaluation; Production; Signal processing algorithms; Speech; Speech enhancement; Automatic production; speech enhancement;
  • fLanguage
    English
  • Journal_Title
    Signal Processing Letters, IEEE
  • Publisher
    ieee
  • ISSN
    1070-9908
  • Type

    jour

  • DOI
    10.1109/LSP.2014.2379648
  • Filename
    6981922