• DocumentCode
    2770150
  • Title

    Data selection for speech recognition

  • Author

    Wu, Yi ; Zhang, Rong ; Rudnicky, Alexander

  • Author_Institution
    Carnegie Mellon Univ., Pittsburgh
  • fYear
    2007
  • fDate
    9-13 Dec. 2007
  • Firstpage
    562
  • Lastpage
    565
  • Abstract
    This paper presents a strategy for efficiently selecting informative data from large corpora of transcribed speech. We propose to choose data uniformly according to the distribution of some target speech unit (phoneme, word, character, etc). In our experiment, in contrast to the common belief that "there is no data like more data", we found it possible to select a highly informative subset of data that produces recognition performance comparable to a system that makes use of a much larger amount of data. At the same time, our selection process is efficient and fast.
  • Keywords
    maximum entropy methods; speech recognition; data selection; speech recognition; transcribed speech; Automatic speech recognition; Broadcasting; Decoding; Entropy; Impedance; Linear discriminant analysis; Management training; Natural languages; Speech recognition; Training data; acoustic modeling; data selection; maximum entropy; speech recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on
  • Conference_Location
    Kyoto
  • Print_ISBN
    978-1-4244-1746-9
  • Electronic_ISBN
    978-1-4244-1746-9
  • Type

    conf

  • DOI
    10.1109/ASRU.2007.4430173
  • Filename
    4430173