A fast approach to spoken term detection based on prosodic dynamic features

Author

Xuejiao Tan; Lei Wang

Author_Institution

School of Information and Communication Engineering, Beijing University Of Posts and Telecommunications, China

fYear

2015

Firstpage

593

Lastpage

596

Abstract

Model-based spoken term detection usually requires huge number of training data with annotation. When lacking enough training data, DTW-based method is a better choice. However, both the model-based and classical DTW-based methods are based on frame by frame template matching. The computation load is heavy and the search efficiency is poor. We propose a fast two-stage-frameworked approach to spoken term detection. Prosodic dynamic features are exploited to rapidly locate hypothesized spoken term regions in the first stage and Gaussian posteriorgrams are exploited to more precisely verify the local hypothesized regions in the second stage. Since each prosodic feature vector only contains three dimensions and represent several continuous frames speech at one time, we can realize segment-based instead of frame-based template matching to accelerate the whole keywords detection process. The two-stage method has fully exploited the long and short time characteristics of speeches. An experiment is conduced to demonstrate our method improves the speed and obtain similar detection performance under the same condition.

Keywords

"Feature extraction","Computational modeling","Maximum likelihood detection","Matched filters","Nonlinear filters","Speech"

Publisher

ieee

Conference_Titel

Progress in Informatics and Computing (PIC), 2015 IEEE International Conference on

Print_ISBN

978-1-4673-8086-7

Type

conf

DOI

10.1109/PIC.2015.7489917

Filename

7489917