Title : 
Latent perceptual mapping with data-driven variable-length acoustic units for template-based speech recognition
         
        
            Author : 
Sundaram, Shiva ; Bellegarda, Jerome R.
         
        
            Author_Institution : 
Deutsche Telekom Labs., Berlin, Germany
         
        
        
        
        
        
            Abstract : 
In recent work, we introduced Latent Perceptual Mapping (LPM) [1], a new framework for acoustic modeling suitable for template-like speech recognition. The basic idea is to leverage a reduced dimensionality description of the observations to derive acoustic prototypes that are closely aligned with perceived acoustic events. Our initial work adopted a bag-of-frames strategy to represent relevant acoustic information within speech segments. In this paper, we extend this approach by better integrating temporal information into the LPM feature extraction. Specifically, we use variable-length units to represent acoustic events at the supra-frame level, in order to benefit from finer temporal alignments when deriving the acoustic prototypes. The outcome can be viewed as a generalization of both conventional template-based approaches and recently proposed sparse representation solutions. This extension is experimentally validated on a context-independent phoneme classification task using the TIMIT corpus.
         
        
            Keywords : 
sparse matrices; speech recognition; LPM feature extraction; TIMIT corpus; acoustic modeling; bag-of-frames strategy; context-independent phoneme classification task; data-driven variable-length acoustic units; latent perceptual mapping; perceived acoustic events; reduced dimensionality description; sparse representation solutions; speech segments; supraframe level; template-based speech recognition; temporal alignments; temporal information integration; Acoustics; Feature extraction; Hidden Markov models; Speech; Speech recognition; Training; Vectors; acoustic modeling; data-driven speech units; dimensionality reduction; latent perceptual mapping; template-based speech recognition;
         
        
        
        
            Conference_Titel : 
Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
         
        
            Conference_Location : 
Kyoto
         
        
        
            Print_ISBN : 
978-1-4673-0045-2
         
        
            Electronic_ISBN : 
1520-6149
         
        
        
            DOI : 
10.1109/ICASSP.2012.6288826