• DocumentCode
    180175
  • Title

    Zero-resource spoken term detection using hierarchical graph-based similarity search

  • Author

    Aoyama, Konosuke ; Ogawa, Anna ; Hattori, Toshihiro ; Hori, Toshikazu ; Nakamura, A.

  • Author_Institution
    NTT Commun. Sci. Labs., NTT Corp., Kyoto, Japan
  • fYear
    2014
  • fDate
    4-9 May 2014
  • Firstpage
    7093
  • Lastpage
    7097
  • Abstract
    This paper presents fast zero-resource spoken term detection (STD) in a large-scale data set, by using a hierarchical graph-based similarity search method (HGSS). HGSS is an improved graph-based similarity search method (GSS) in terms of a search space for high-speed performance. Instead of a degree-reduced k-nearest neighbor (k-DR) graph for GSS, a hierarchical k-DR graph, which is constructed based on a cluster structure in the corresponding k-DR graph, is used as an index for HGSS. A search algorithm for the hierarchical k-DR graph effectively utilizes the cluster structure, resulting in the reduction of the search space. HGSS inherits the useful property of GSS; it is available for any data sets without limits on a data type nor a defined dissimilarity since a graph is a general expression of a relationship between objects. A vertex and an edge in the hierarchical graph correspond to a Gaussian mixture model (GMM) posterior-gram segment and the relationship between a pair of GMM poste-riorgram segments, which is measured by dynamic time warping, respectively. Experimental results demonstrate that HGSS successfully reduces the computational cost by more than 40 % at nearly the same accuracy, compared to GSS.
  • Keywords
    Gaussian processes; graph theory; mixture models; search problems; signal detection; speech processing; GMM posterior-gram segments; Gaussian mixture model posterior-gram segment; HGSS; STD; cluster structure; degree-reduced k-nearest neighbor graph; dynamic time warping; hierarchical graph-based similarity search; hierarchical k-DR graph; search algorithm; search space reduction; zero-resource spoken term detection; Accuracy; Acoustics; Clustering algorithms; Indexes; Search methods; Speech; Speech processing; Dynamic time warping; Neighborhood graph index; Query-by-example search; Spoken term detection; Zero resource;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
  • Conference_Location
    Florence
  • Type

    conf

  • DOI
    10.1109/ICASSP.2014.6854976
  • Filename
    6854976