Title : 
Rapid bootstrapping of a Ukrainian large vocabulary continuous speech recognition system
         
        
            Author : 
Schlippe, Tim ; Volovyk, Mykola ; Yurchenko, Kateryna ; Schultz, Tanja
         
        
            Author_Institution : 
Cognitive Syst. Lab., Karlsruhe Inst. of Technol. (KIT), Karlsruhe, Germany
         
        
        
        
        
            Abstract : 
We report on our efforts toward an LVCSR system for the Slavic language Ukrainian. We describe the Ukrainian text and speech database recently collected as a part of our GlobalPhone corpus [1] with our Rapid Language Adaptation Toolkit [2]. The data was complemented by a large collection of text data crawled from various Ukrainian websites. For the production of the pronunciation dictionary, we investigate strategies using grapheme-to-phoneme (g2p) models derived from existing dictionaries of other languages, thereby reducing severely the necessary manual effort. Russian and Bulgarian g2p models even decrease the number of pronunciation rules to one fifth. We achieve significant improvement by applying state-of-the art techniques for acoustic modeling and our day-wise text collection and language model interpolation strategy [3]. Our best system achieves a word error rate of 11.21% on the test set on read newspaper speech.
         
        
            Keywords : 
database languages; dictionaries; interpolation; speech recognition; Bulgarian g2p model; GlobalPhone corpus; LVCSR system; Russian g2p model; Slavic language Ukrainian; Ukrainian speech database; Ukrainian text database; Ukrainian website; acoustic modeling technique; day-wise text collection; grapheme-to-phoneme model; language model interpolation strategy; large vocabulary continuous speech recognition system; pronunciation dictionary production; rapid bootstrapping; rapid language adaptation toolkit; read newspaper speech; word error rate; Abstracts; Adaptation models; Gold; Optimization; Speech; Speech recognition; Vocabulary; Slavic language; Ukrainian; pronunciation dictionary; rapid language adaptation; speech recognition;
         
        
        
        
            Conference_Titel : 
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
         
        
            Conference_Location : 
Vancouver, BC
         
        
        
        
            DOI : 
10.1109/ICASSP.2013.6639086