Title :
Self-improvement of voice interface with user-input spoken query at early stage of commercialization
Author :
Kwang-Ho Kim ; Donghyun Lee ; Namhyun Cho ; Hyung Jeon ; Ji-Hwan Kim
Author_Institution :
Dept. of Comput. Sci. & Eng., Sogang Univ., Seoul, South Korea
Abstract :
This paper concerns the self-improvement of voice interface by using acoustic model re-training with user-input spoken query at early stage of commercialization, when the conventional confidence measure-based acoustic model re-training is not reliable. This paper analyzes error patterns in user-input spoken queries, categorizes these error patterns, defines a quantitative measurement for each category of error patterns and proposes a filter-based approach over this quantitative measurement. The proposed filter-based method includes four distinctive filters: filter over environmental noise level, filter over non-pitch ratio within utterance, filter over average phoneme duration function score and filter over clipped frame composition ratio. For the evaluation, the initial performance of the acoustic model was measured at 66.1% in terms of speech recognition rate. The overall performance is demonstrated as 73.8% when all of the proposed filters are applied for the re-training of the acoustic model. This result demonstrates 3.1% better recognition rate than a confidence measure-based acoustic model re-training method. Our proposed method is applicable to other data-driven classification services of consumer electronic products in other mediums (e.g. image) at their early stage of commercialization.
Keywords :
acoustic filters; noise (working environment); query processing; speech recognition; acoustic model re-training; average phoneme duration function score; clipped frame composition ratio; commercialization stage; consumer electronic products; data-driven classification services; distinctive filters; environmental noise level; error patterns; filter-based approach; nonpitch ratio; quantitative measurement; self-improvement; speech recognition rate; user-input spoken query; utterance; voice interface; Acoustic measurements; Data models; Speech; Speech recognition; Training data; Working environment noise;
Journal_Title :
Consumer Electronics, IEEE Transactions on
DOI :
10.1109/TCE.2013.6689699