• DocumentCode
    706078
  • Title

    Simultaneous multispeaker segmentation for automatic meeting recognition

  • Author

    Laskowski, Kornel ; Fugen, Christian ; Schultz, Tanja

  • Author_Institution
    interACT, Univ. Karlsruhe, Karlsruhe, Germany
  • fYear
    2007
  • fDate
    3-7 Sept. 2007
  • Firstpage
    1294
  • Lastpage
    1298
  • Abstract
    Vocal activity detection is an important technology for both automatic speech recognition and automatic speech understanding. In meetings, participants typically vocalize for only a fraction of the recorded time, and standard vocal activity detection algorithms for close-talk microphones have shown to be ineffective. This is primarily due to the problem of crosstalk, in which a participant´s speech appears on other participants´ microphones, making it hard to attribute detected speech to its correct speaker. We describe an automatic multichannel segmentation system for meeting recognition, which accounts for both the observed acoustics and the inferred vocal activity states of all participants using joint multi-participant models. Our experiments show that this approach almost completely eliminates the crosstalk problem. Recent improvements to the baseline reduce the development set word error rate, achieved by a state-of-the-art multi-pass speech recognition system, by 62% relative to manual segmentation. We also observe significant performance improvements on unseen data.
  • Keywords
    speaker recognition; automatic meeting recognition; automatic multichannel segmentation system; automatic speech recognition; automatic speech understanding; multispeaker segmentation; vocal activity detection; word error rate; Acoustics; Crosstalk; Manuals; Microphones; Silicon; Speech; Training;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signal Processing Conference, 2007 15th European
  • Conference_Location
    Poznan
  • Print_ISBN
    978-839-2134-04-6
  • Type

    conf

  • Filename
    7099014