Integrating frame-based and segment-based dynamic time warping for unsupervised spoken term detection with spoken queries

Author

Chan, Chun-An ; Lee, Lin-shan

Author_Institution

Grad. Inst. of Commun. Eng., Nat. Taiwan Univ., Taipei, Taiwan

fYear

2011

fDate

22-27 May 2011

Firstpage

5652

Lastpage

5655

Abstract

ABSTRACT Rapidly increasing quantities of multimedia and spoken con tent today demand fast and accurate retrieval approaches for con venient browsing. The spoken documents with wide variety of different acoustic and linguistic conditions make supervised training of well-matched acoustic/language models very difficult. Unsuper vised methods using frame-based dynamic time warping (DTW) re quire no acoustic/language models but with high computation load. Therefore, segment-based DTW was proposed to relieve the computation load at the cost of degraded detection performance. In this pa per, we refine the segment-based DTW by allowing deletion of end segments of query to improve detection performance. The search space is also reduced by segment similarity constraints. We also pro posed a two-pass framework. The segment-baed DTW is performed in the first pass to locate hypothesized spoken term region and the frame-based DTW for precise rescoring in the second pass. Then the pseudo relevance feedback is used to expand acoustic variations of the query. We obtain significantly higher detection performance at significantly lower computation load as compared to frame-based DTW.

Keywords

document handling; indexing; multimedia computing; query processing; relevance feedback; speech recognition; automatic speech recognizer; detection performance improvement; end segment deletion; frame-based dynamic time warping; pseudo relevance feedback; segment similarity constraints; segment-based dynamic time warping; spoken queries; supervised training; unsupervised spoken term detection; Acoustics; Clustering algorithms; Computational modeling; Multimedia communication; Pragmatics; Speech; Training; Spoken term detection; dynamic time warping;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on

Conference_Location

Prague

ISSN

1520-6149

Print_ISBN

978-1-4577-0538-0

Electronic_ISBN

1520-6149

Type

conf

DOI

10.1109/ICASSP.2011.5947642

Filename

5947642