• DocumentCode
    134342
  • Title

    Research on deep neural network´s hidden layers in phoneme recognition

  • Author

    Yuan Ma ; Jianwu Dang ; Weifeng Li

  • Author_Institution
    Tianjin Key Lab. of Cognitive Comput. & Applic., Tianjin Univ., Tianjin, China
  • fYear
    2014
  • fDate
    12-14 Sept. 2014
  • Firstpage
    19
  • Lastpage
    23
  • Abstract
    In spite of the great success of the deep neural network (DNN) in speech processing, it is still unclear what kind of underlying mechanisms are involved in this achievement. This preliminary study attempts to find an answer by investigating the functions of DNN´s hidden layers in representing speech articulations. Two sets of experiments are performed on the hidden layers in speech recognition. The layer removing experiment is conducted on the English TIMIT database, and the layer replacing experiment is to substitute a layer in an English DNN by the corresponding layer in a Japanese DNN. It is found that the different layers seem to be responsible for different phoneme groups according to the place of articulation. The lower layers are responsible for the back vowels, and the higher layers are responsible for the front vowels. The second layer (i.e. the first hidden layer) of the seven-layer network has major responsibility for more than half of the consonants with the constriction located in the front of the vocal tract, while the other consonants rely on the middle and higher layers. The layer replacing experiment demonstrated that the above relation was language independent. It is necessary to design elaborate studies to discover more details in the future.
  • Keywords
    neural nets; speech recognition; English DNN; English TIMIT database; Japanese DNN; consonants; deep neural network hidden layers; layer replacing experiment; phoneme recognition; seven-layer network; speech articulations; speech recognition; vocal tract; Acoustics; Biological neural networks; Error analysis; Production; Speech; Speech processing; Speech recognition; articulation; deep neural network; hidden layers; speech production; speech recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Chinese Spoken Language Processing (ISCSLP), 2014 9th International Symposium on
  • Conference_Location
    Singapore
  • Type

    conf

  • DOI
    10.1109/ISCSLP.2014.6936718
  • Filename
    6936718