• DocumentCode
    134267
  • Title

    Exploiting speech source information for vowel landmark detection for low resource language

  • Author

    Undhad, Ankur G. ; Patil, Hemant A. ; Madhavi, Maulik C.

  • Author_Institution
    Dhirubhai Ambani Inst. of Inf. & Commun. Technol. (DA-IICT), Gandhinagar, India
  • fYear
    2014
  • fDate
    12-14 Sept. 2014
  • Firstpage
    546
  • Lastpage
    550
  • Abstract
    Landmarks are the time instants in a speech signal which marks important events (such as vowels, consonants and glides) in the speech signal. This paper proposes use of novel vowel landmark detection (VLD) algorithm for low resourced language, viz., Gujarati, an Indian language. The proposed VLD method uses speech source information to detect the vowel landmarks which are points of high sonority. The excitation peaks in Hilbert envelope of Teager energy profile of zero frequency filtered (ZFF) speech signal can be interpreted as perceptually significant feature which contribute to the loudness. The performance of proposed VLD method is compared with existing loudness-based method. The results are reported on speech recorded in three different modes, viz., read, spontaneous and lecture followed by manual phonetic transcription by the transcribers (to be used as the ground truth) for Gujarati. In particular, the proposed VLD algorithm performs relatively better than an existing loudness-based method. The proposed VLD algorithm has detection rate of 78.92 %, 76.40 % and 73.89 %, which is 8.79 %, 7.23 % and 7.17 % more as compared to loudness-based method in lecture, spontaneous and read mode, respectively. The proposed algorithm is also shown to be robust against signal degradations such as white noise. In addition, proposed algorithm is fast and requires no training.
  • Keywords
    filtering theory; speech processing; speech recognition; Gujarati language; Hilbert envelope; Indian language; Teager energy profile; VLD algorithm; ZFF speech signal; consonants; excitation peaks; glides; ground truth; high-sonority points; lecture mode; loudness; low-resource language; manual phonetic transcription; perceptually significant feature; read mode; signal degradation robustness; signal detection rate; speech source information; spontaneous mode; time instants; vowel landmark detection algorithm; white noise; zero-frequency filtered speech signal; Acoustics; Databases; Feature extraction; Signal to noise ratio; Speech; Speech processing; System-on-chip; Landmark; Teager energy operator (TEO); loudness; sonority; vowel-nucleus; zero-frequency resonator (ZFR);
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Chinese Spoken Language Processing (ISCSLP), 2014 9th International Symposium on
  • Conference_Location
    Singapore
  • Type

    conf

  • DOI
    10.1109/ISCSLP.2014.6936660
  • Filename
    6936660