• DocumentCode
    2181140
  • Title

    Integrating frame-based and segment-based dynamic time warping for unsupervised spoken term detection with spoken queries

  • Author

    Chan, Chun-An ; Lee, Lin-shan

  • Author_Institution
    Grad. Inst. of Commun. Eng., Nat. Taiwan Univ., Taipei, Taiwan
  • fYear
    2011
  • fDate
    22-27 May 2011
  • Firstpage
    5652
  • Lastpage
    5655
  • Abstract
    ABSTRACT Rapidly increasing quantities of multimedia and spoken con tent today demand fast and accurate retrieval approaches for con venient browsing. The spoken documents with wide variety of different acoustic and linguistic conditions make supervised training of well-matched acoustic/language models very difficult. Unsuper vised methods using frame-based dynamic time warping (DTW) re quire no acoustic/language models but with high computation load. Therefore, segment-based DTW was proposed to relieve the computation load at the cost of degraded detection performance. In this pa per, we refine the segment-based DTW by allowing deletion of end segments of query to improve detection performance. The search space is also reduced by segment similarity constraints. We also pro posed a two-pass framework. The segment-baed DTW is performed in the first pass to locate hypothesized spoken term region and the frame-based DTW for precise rescoring in the second pass. Then the pseudo relevance feedback is used to expand acoustic variations of the query. We obtain significantly higher detection performance at significantly lower computation load as compared to frame-based DTW.
  • Keywords
    document handling; indexing; multimedia computing; query processing; relevance feedback; speech recognition; automatic speech recognizer; detection performance improvement; end segment deletion; frame-based dynamic time warping; pseudo relevance feedback; segment similarity constraints; segment-based dynamic time warping; spoken queries; supervised training; unsupervised spoken term detection; Acoustics; Clustering algorithms; Computational modeling; Multimedia communication; Pragmatics; Speech; Training; Spoken term detection; dynamic time warping;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on
  • Conference_Location
    Prague
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4577-0538-0
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2011.5947642
  • Filename
    5947642