• DocumentCode
    3530268
  • Title

    Detecting bandlimited audio in broadcast television shows

  • Author

    Fuhs, Mark C. ; Jin, Qin ; Schultz, Tanja

  • Author_Institution
    Language Technol. Inst., Carnegie Mellon Univ., Pittsburgh, PA
  • fYear
    2009
  • fDate
    19-24 April 2009
  • Firstpage
    4589
  • Lastpage
    4592
  • Abstract
    For TV and radio shows containing narrowband speech, Speech-to-text (STT) accuracy on the narrowband audio can be improved by using an acoustic model trained on acoustically matched data. To selectively apply it, one must first be able to accurately detect which audio segments are narrowband. The present paper explores two different bandwidth classification approaches: a traditional Gaussian mixture model (GMM) approach and a spline-based classifier that categorizes audio segments based on their power spectra. We focus on shows found in the DARPA GALE Mandarin training and test sets, where the ratio of wideband to narrowband shows is very large. In this setting, the spline-based classifier reduces the number of misclassified wideband segments by up to 95% relative to the GMM-based classifier for the same number of misclassified narrowband segments.
  • Keywords
    pattern classification; speech recognition; speech synthesis; splines (mathematics); Gaussian mixture model; TV shows; acoustically matched data; audio segments; bandlimited audio detection; bandwidth classification; broadcast television shows; misclassified narrowband segments; narrowband audio; narrowband speech; radio shows; speech-to-text accuracy; spline-based classifier; Acoustic signal detection; Acoustic testing; Bandwidth; Decoding; Narrowband; Speech; Spline; TV broadcasting; Telephony; Wideband; Speech processing; pattern classification; speech recognition; telephony;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on
  • Conference_Location
    Taipei
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4244-2353-8
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2009.4960652
  • Filename
    4960652