Title : 
Iterative grapheme-to-phoneme alignment for the training of WFST-based phonetic conversion
         
        
            Author : 
Bohac, Marek ; Malek, Jiri ; Blavka, Karel
         
        
            Author_Institution : 
Inst. of Inf. Technol. & Electron., Tech. Univ. of Liberec, Liberec, Czech Republic
         
        
        
        
        
        
            Abstract : 
In this paper we propose an algorithm for grapheme-to-phoneme (G2P) alignment. Such alignment is needed mainly for the data-driven training of G2P conversion tools. Our approach utilizes a given phonetic alphabet and a set of given orthographic-phonetic word pairs as a source of prior knowledge. The development data are taken from a manually created pronunciation lexicon for a large vocabulary speech recognition system for Czech. The alignment method is based on extended Minimum Edit Distance algorithm. Moreover, we propose an approach to avoid the creation of reference alignments - we evaluate the improvements through a specially designed G2P converter, i.e. we compare the phonetic transcription directly to a set of test orthographic-phonetic word pairs. Results of our approach are comparable or even slightly better than the state-of-the-art.
         
        
            Keywords : 
iterative methods; speech processing; speech recognition; Czech; data-driven training; iterative grapheme-to-phoneme alignment; minimum edit distance algorithm; orthographic-phonetic word pairs; phonetic conversion; phonetic transcription; vocabulary speech recognition system; weighted finite state transducers; Dictionaries; Educational institutions; Measurement; Speech recognition; Training; Training data; Vocabulary; Alignment; Grapheme-to-phoneme; Phonetisaurus; WFST; conversion;
         
        
        
        
            Conference_Titel : 
Telecommunications and Signal Processing (TSP), 2013 36th International Conference on
         
        
            Conference_Location : 
Rome
         
        
            Print_ISBN : 
978-1-4799-0402-0
         
        
        
            DOI : 
10.1109/TSP.2013.6613977