Title : 
Re-ranking of spoken term detections using CRF-based triphone detection models
         
        
            Author : 
Sawada, Naoki ; Natori, Satoshi ; Nishizaki, Hiromitsu
         
        
            Author_Institution : 
Dept. of Educ., Univ. of Yamanashi, Kofu, Japan
         
        
        
        
        
        
            Abstract : 
Conventional spoken term detection (STD) techniques, which use a text-based matching approach based on automatic speech recognition (ASR) systems, are not robust for speech recognition errors. This paper proposes a conditional random fields (CRF)-based re-ranking approach, which recomputes detection scores produced by a phoneme-based dynamic time warping (DTW) STD approach. In the re-ranking approach, we tackle STD as a sequence labeling problem. We use CRF-based triphone detection models based on features generated from multiple types of phoneme-based transcriptions. They train recognition error patterns such as phoneme-to-phoneme confusions on the CRF framework. Therefore, the models can detect a triphone, which is one of triphones composing a query term, with detection probability. In the experimental evaluation on the Japanese OOV test collection, the CRF-based approach alone could not outperform the conventional DTW-based approach we have already proposed; however, it worked well in the re-ranking (second-pass) process for the detections from the DTW-based approach. The CRF-based re-ranking approach made a 2.4% improvement of F-measure in the STD performance.
         
        
            Keywords : 
pattern matching; random processes; speech recognition; text analysis; CRF-based re-ranking approach; CRF-based triphone detection model; DTW-based approach; F-measure; Japanese OOV test collection; conditional random field; detection score recomputation; phoneme-based dynamic time warping; phoneme-based transcriptions; recognition error patterns; sequence labeling problem; spoken term detection; text-based matching approach; Feature extraction; Hidden Markov models; Indexes; Probability; Speech; Speech recognition; Training;
         
        
        
        
            Conference_Titel : 
Asia-Pacific Signal and Information Processing Association, 2014 Annual Summit and Conference (APSIPA)
         
        
            Conference_Location : 
Siem Reap
         
        
        
            DOI : 
10.1109/APSIPA.2014.7041550