• DocumentCode
    178397
  • Title

    A discriminatively trained Hough Transform for frame-level phoneme recognition

  • Author

    Dennis, Jonathan ; Huy Dat Tran ; Haizhou Li ; Eng Siong Chng

  • Author_Institution
    Inst. for Infocomm Res., A*STAR, Singapore, Singapore
  • fYear
    2014
  • fDate
    4-9 May 2014
  • Firstpage
    2514
  • Lastpage
    2518
  • Abstract
    Despite recent advances in the use of Artificial Neural Network (ANN) architectures for automatic speech recognition (ASR), relatively little attention has been given to using feature inputs beyond MFCCs in such systems. In this paper, we propose an alternative to conventional MFCC or filterbank features, using an approach based on the Generalised Hough Transform (GHT). The GHT is a common approach used in the field of image processing for the task of object detection, where the idea is to learn the spatial distribution of a codebook of feature information relative to the location of the target class. During recognition, a simple weighted summation of the codebook activations is commonly used to detect the presence of the target classes. Here we propose to learn the weighting discriminatively in an ANN, where the aim is to optimise the static phone classification error at the output of the network. As such an ANN is common to hybrid ASR architectures, the output activations from the GHT can be considered as a novel feature for ASR. Experimental results on the TIMIT phoneme recognition task demonstrate the state-of-the-art performance of the approach.
  • Keywords
    Hough transforms; image processing; neural nets; object detection; speech recognition; ANN; Hough transform; MFCC; artificial neural network; automatic speech recognition; frame-level phoneme recognition; generalised Hough Transform; image processing; object detection; spatial distribution; static phone classification error; Artificial neural networks; Computer architecture; Feature extraction; Speech; Speech recognition; Training; Transforms; Hough Transform; Phoneme recognition; TIMIT; object detection;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
  • Conference_Location
    Florence
  • Type

    conf

  • DOI
    10.1109/ICASSP.2014.6854053
  • Filename
    6854053