• DocumentCode
    2132633
  • Title

    Learning distances to improve phoneme classification

  • Author

    Curtin, Ryan ; Vasiloglou, Nikolaos ; Anderson, David V.

  • Author_Institution
    Sch. of Electr. & Comput. Eng., Georgia Inst. of Technol., Atlanta, GA, USA
  • fYear
    2011
  • fDate
    18-21 Sept. 2011
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    In this work we aim to learn a Mahalanobis distance to improve the performance of phoneme classification using the standard 39-dimensional MFCC features. To learn and to evaluate the performance of our distance, we use the simple k-nearest-neighbors (k-NN) classifier. Although this classifier exhibits low performance relative to state-of-the-art phoneme classifiers, it can be used to determine a distance metric that is applicable to many other better-performing machine learning methods. We devise a novel optimization method that minimizes the error function of the k-NN classifier with respect to the covariance matrix of the Mahalanobis distance, based on finite-difference stochastic approximation (FDSA) gradient estimates combined with a random perturbation term to avoid local minima. We apply our method to the problem of phoneme classification with the k-NN classifier and show that our learned distance provides performance improvement of up to 8:19% over the standard k-NN classifier, and additionally outperforms other state-of-the-art distance learning methods by approximately 4 percentage points. We also find that the computational complexity of our method, while not optimal, is better than other distance learning methods. The performance improvements for individual phoneme classes are given. The distances learned are applicable to other scale-variant machine learning methods, such as support vector machines, multidimensional scaling, and maximum variance unfolding, as well as others.
  • Keywords
    computational complexity; distance learning; gradient methods; learning (artificial intelligence); pattern classification; support vector machines; FDSA; MFCC features; Mahalanobis distance; computational complexity; distance learning methods; distance metric; finite difference stochastic approximation; gradient estimation; k-NN; k-nearest-neighbors; learning distances; machine learning methods; optimization method; phoneme classification; support vector machines; Accuracy; Computer aided instruction; Learning systems; Machine learning; Measurement; Mel frequency cepstral coefficient; Optimization; Mahalanobis distance; distance learning; k-nearest-neighbors; phoneme classification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning for Signal Processing (MLSP), 2011 IEEE International Workshop on
  • Conference_Location
    Santander
  • ISSN
    1551-2541
  • Print_ISBN
    978-1-4577-1621-8
  • Electronic_ISBN
    1551-2541
  • Type

    conf

  • DOI
    10.1109/MLSP.2011.6064601
  • Filename
    6064601