DocumentCode
134334
Title
A fusion approach to spoken language identification based on combining multiple phone recognizers and speech attribute detectors
Author
Yannan Wang ; Jun Du ; Lirong Dai ; Chin-Hui Lee
Author_Institution
Nat. Eng. Lab. for Speech & Language Inf. Process., Univ. of Sci. & Technol. of China, Hefei, China
fYear
2014
fDate
12-14 Sept. 2014
Firstpage
158
Lastpage
162
Abstract
We propose a fusion approach to spoken language recognition by combining multiple tokenizers with phone and speech attribute models trained on a collection of multilingual corpora with different front-end features. The speech attribute models are trained with bottleneck features extracted from deep neural networks while the phone models are trained with temporal patterns neural network features. By exploiting different combinations of front-end features, fundamental speech units and tokenization models, we demonstrate that speech attribute units are complementary to phone units and produce enhanced performances when they are combined with conventional phone based tokenizers. Tested on the National Institute of Standards and Technology 2009 language recognition evaluation task, leveraged upon diversity in system combination, we find that speech attribute recognition followed by language modeling achieves an additional average relative equal error rate reduction of more than 20% when fused with the state-of-the-art systems with phone recognition followed by language modeling.
Keywords
feature extraction; neural nets; speech recognition; bottleneck feature extraction; front-end features; fusion approach; language modeling; language recognition evaluation task; multilingual corpora; phone attribute models; phone based tokenizers; phone recognition; phone recognizers; phone units; speech attribute detectors; speech attribute models; speech attribute recognition; speech attribute units; spoken language identification; spoken language recognition; temporal pattern neural network features; tokenization models; Acoustics; Feature extraction; Hidden Markov models; NIST; Neural networks; Speech; Speech recognition; automatic speech attribute transcription; bottleneck features; deep neural network; manner and place of articulation; phone recognition followed by language modeling; phonetic features; spoken language recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Chinese Spoken Language Processing (ISCSLP), 2014 9th International Symposium on
Conference_Location
Singapore
Type
conf
DOI
10.1109/ISCSLP.2014.6936714
Filename
6936714
Link To Document