• DocumentCode
    454576
  • Title

    Speaker Overlaps and ASR Errors in Meetings: Effects Before, During, and After the Overlap

  • Author

    Çetin, Özgür ; Shriberg, Elizabeth

  • Author_Institution
    Int. Comput. Sci. Inst., Berkeley, CA
  • Volume
    1
  • fYear
    2006
  • fDate
    14-19 May 2006
  • Abstract
    We analyze automatic speech recognition (ASR) errors made by a state-of-the-art meeting recognizer, with respect to locations of overlapping speech. Our analysis focuses on recognition errors made both during an overlap and in the regions immediately preceding and following the location of overlapped speech. We devise an experimental paradigm to allow examination of the same foreground speech both with and without naturally occurring cross-talk. We then analyze ASR errors with respect to a number of factors, including the severity of the cross-talk and distance from the overlap region. In addition to reporting effects on ASR errors, we discover a number of interesting phenomena. First, we find that overlaps tend to occur at high-perplexity regions in the foreground talker´s speech. Second, word sequences within overlaps have higher perplexity than those in nonoverlaps, if using trigrams or 4-grams, but the unigram perplexity within overlaps is considerably lower than that of nonoverlaps. An explanation for this behavior is proposed, based on the preponderance of multiple short dialog acts found in overlap regions. Third, we discover that the word error rate (WER) after overlaps is consistently lower than that before the overlap. This finding cannot be explained by the recognition process itself; rather, the foreground speaker appears to reduce perplexity shortly after being overlapped. Taken together, these observations suggest that the automatic modeling of meetings could benefit from a broader view of the relationship between speaker overlap and ASR in natural conversation
  • Keywords
    speech recognition; automatic speech recognition; cross-talk; overlapping speech; speaker overlaps; state-of-the-art meeting recognizer; word error rate; word sequences; Automatic speech recognition; Computer errors; Computer science; Error analysis; Loudspeakers; Microphones; NIST; Speech analysis; Speech recognition; Telephony;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on
  • Conference_Location
    Toulouse
  • ISSN
    1520-6149
  • Print_ISBN
    1-4244-0469-X
  • Type

    conf

  • DOI
    10.1109/ICASSP.2006.1660031
  • Filename
    1660031