Title :
Bag Of ARCS: New representation of speech segment features based on finite state machines
Author :
Watanabe, Shinji ; Kubo, Yotaro ; Oba, Tomohiro ; Hori, Takaaki ; Nakamura, Atsushi
Author_Institution :
NTT Commun. Sci. Labs., NTT Corp., Kyoto, Japan
Abstract :
This paper proposes a new feature representation, Bag Of Arcs (BOA) for speech segments. A speech segment in BOA is simply represented as a set of counts for unique arcs in a finite state machine. Similar to the Bag Of Words model (BOW), BOA disregards the order of arcs, and thus, efficiently models speech segments. A strong motivation to use BOA is provided by a fact that the BOA representation is tightly connected to the output of a Weighted Finite State Transducer (WFST) based ASR decoder. Thus, BOA directly represents elements in the search network of a WFST-based ASR decoder, and can include information about context-dependent HMM topologies, lexicons, and back-off smoothed n-gram networks. In addition, the counts of BOA are accumulated by using the WFST decoder output directly, and we do not require an additional overhead and a change of decoding algorithms to extract the features. Consequently, we can combine the ASR decoder and post-processing without a process to extract word features from the decoder outputs or re-compiling WFST networks. We show the effectiveness of the proposed approach for some ASR post-processing applications in utterance classification experiments, and in speaker adaptation experiments by achieving absolute 1% improvement in WER from baseline results. We also show examples of latent semantic analysis for BOA by using latent Dirichlet allocation.
Keywords :
decoding; feature extraction; finite state machines; hidden Markov models; semantic networks; speech coding; speech recognition; transducers; ASR decoder; BOA model; BOW model; WER; WFST; back-off smoothed n-gram network; bag of arcs model; bag of words model; context-dependent HMM topology; finite state machine; latent Dirichlet allocation; latent semantic analysis; lexicon; speaker adaptation experiment; speech segment feature representation; utterance classification experiment; weighted finite state transducer; word feature extraction; Abstracts; Force; Indexes; Lifting equipment; Sun; Time frequency analysis; Bag Of Arcs (BOA); Speech segment feature; finite state machine; speaker recognition; utterance classification;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
Conference_Location :
Kyoto
Print_ISBN :
978-1-4673-0045-2
Electronic_ISBN :
1520-6149
DOI :
10.1109/ICASSP.2012.6288845