DocumentCode
3777096
Title
A fast approach to spoken term detection based on prosodic dynamic features
Author
Xuejiao Tan; Lei Wang
Author_Institution
School of Information and Communication Engineering, Beijing University Of Posts and Telecommunications, China
fYear
2015
Firstpage
593
Lastpage
596
Abstract
Model-based spoken term detection usually requires huge number of training data with annotation. When lacking enough training data, DTW-based method is a better choice. However, both the model-based and classical DTW-based methods are based on frame by frame template matching. The computation load is heavy and the search efficiency is poor. We propose a fast two-stage-frameworked approach to spoken term detection. Prosodic dynamic features are exploited to rapidly locate hypothesized spoken term regions in the first stage and Gaussian posteriorgrams are exploited to more precisely verify the local hypothesized regions in the second stage. Since each prosodic feature vector only contains three dimensions and represent several continuous frames speech at one time, we can realize segment-based instead of frame-based template matching to accelerate the whole keywords detection process. The two-stage method has fully exploited the long and short time characteristics of speeches. An experiment is conduced to demonstrate our method improves the speed and obtain similar detection performance under the same condition.
Keywords
"Feature extraction","Computational modeling","Maximum likelihood detection","Matched filters","Nonlinear filters","Speech"
Publisher
ieee
Conference_Titel
Progress in Informatics and Computing (PIC), 2015 IEEE International Conference on
Print_ISBN
978-1-4673-8086-7
Type
conf
DOI
10.1109/PIC.2015.7489917
Filename
7489917
Link To Document