• DocumentCode
    3632023
  • Title

    Turkish broadcast news transcription with open-source software

  • Author

    Dogan Can;Murat Saraclar

  • Author_Institution
    Elektrik Elektronik M?hendisli?i B?l?m?, Bo?azi?i ?niversitesi, 34342, Bebek, ?stanbul, T?rkiye
  • fYear
    2009
  • fDate
    4/1/2009 12:00:00 AM
  • Firstpage
    325
  • Lastpage
    328
  • Abstract
    In this paper, we present our Turkish large vocabulary continuous speech recognition (LVCSR) system, which is based on open-source software (HTK, SRILM) and which utilizes 187 hours of Turkish broadcast news data as well as a 184 million-word text corpus collected from various Turkish news portals. Within this system, three different acoustic models optimizing ML, MMI and MPE criteria were developed and the contribution of discriminative acoustic modeling to Turkish LVCSR was investigated. Recognition experiments utilizing a tri-gram language model with 50 K vocabulary give word error rates of 25.8% with ML, 24.3% with MMI and finally 23.7% with MPE.
  • Keywords
    "Open source software","Broadcasting","Vocabulary","Maximum likelihood estimation","Speech recognition","Portals","Error analysis","Mutual information","Lattices"
  • Publisher
    ieee
  • Conference_Titel
    Signal Processing and Communications Applications Conference, 2009. SIU 2009. IEEE 17th
  • ISSN
    2165-0608
  • Print_ISBN
    978-1-4244-4435-9
  • Type

    conf

  • DOI
    10.1109/SIU.2009.5136398
  • Filename
    5136398