• DocumentCode
    2790075
  • Title

    Approaches to automatic lexicon learning with limited training examples

  • Author

    Goel, Nagendra ; Thomas, Samuel ; Agarwal, Mohit ; Akyazi, Pinar ; Burget, Lukas ; Feng, Kai ; Ghoshal, Arnab ; Glembek, Ondrej ; Karafiát, Martin ; Povey, Daniel ; Rastrow, Ariya ; Rose, Richard C. ; Schwarz, Petr

  • Author_Institution
    Go-Vivace Inc., McLean, VA, USA
  • fYear
    2010
  • fDate
    14-19 March 2010
  • Firstpage
    5094
  • Lastpage
    5097
  • Abstract
    Preparation of a lexicon for speech recognition systems can be a significant effort in languages where the written form is not exactly phonetic. On the other hand, in languages where the written form is quite phonetic, some common words are often mispronounced. In this paper, we use a combination of lexicon learning techniques to explore whether a lexicon can be learned when only a small lexicon is available for boot-strapping. We discover that for a phonetic language such as Spanish, it is possible to do that better than what is possible from generic rules or hand-crafted pronunciations. For a more complex language such as English, we find that it is still possible but with some loss of accuracy.
  • Keywords
    learning (artificial intelligence); natural language processing; speech recognition; automatic lexicon learning technique; bootstrapping; phonetic language; speech recognition systems; Acoustics; Automatic speech recognition; Costs; Dictionaries; Humans; Loudspeakers; Natural languages; Speech recognition; Training data; Vocabulary; LVCSR; Lexicon Learning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on
  • Conference_Location
    Dallas, TX
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4244-4295-9
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2010.5495037
  • Filename
    5495037