DocumentCode
2181140
Title
Integrating frame-based and segment-based dynamic time warping for unsupervised spoken term detection with spoken queries
Author
Chan, Chun-An ; Lee, Lin-shan
Author_Institution
Grad. Inst. of Commun. Eng., Nat. Taiwan Univ., Taipei, Taiwan
fYear
2011
fDate
22-27 May 2011
Firstpage
5652
Lastpage
5655
Abstract
ABSTRACT Rapidly increasing quantities of multimedia and spoken con tent today demand fast and accurate retrieval approaches for con venient browsing. The spoken documents with wide variety of different acoustic and linguistic conditions make supervised training of well-matched acoustic/language models very difficult. Unsuper vised methods using frame-based dynamic time warping (DTW) re quire no acoustic/language models but with high computation load. Therefore, segment-based DTW was proposed to relieve the computation load at the cost of degraded detection performance. In this pa per, we refine the segment-based DTW by allowing deletion of end segments of query to improve detection performance. The search space is also reduced by segment similarity constraints. We also pro posed a two-pass framework. The segment-baed DTW is performed in the first pass to locate hypothesized spoken term region and the frame-based DTW for precise rescoring in the second pass. Then the pseudo relevance feedback is used to expand acoustic variations of the query. We obtain significantly higher detection performance at significantly lower computation load as compared to frame-based DTW.
Keywords
document handling; indexing; multimedia computing; query processing; relevance feedback; speech recognition; automatic speech recognizer; detection performance improvement; end segment deletion; frame-based dynamic time warping; pseudo relevance feedback; segment similarity constraints; segment-based dynamic time warping; spoken queries; supervised training; unsupervised spoken term detection; Acoustics; Clustering algorithms; Computational modeling; Multimedia communication; Pragmatics; Speech; Training; Spoken term detection; dynamic time warping;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on
Conference_Location
Prague
ISSN
1520-6149
Print_ISBN
978-1-4577-0538-0
Electronic_ISBN
1520-6149
Type
conf
DOI
10.1109/ICASSP.2011.5947642
Filename
5947642
Link To Document