• DocumentCode
    2789454
  • Title

    Automatic disfluency removal for improving spoken language translation

  • Author

    Wang, Wen ; Tur, Gokhan ; Zheng, Jing ; Ayan, Necip Fazil

  • Author_Institution
    Speech Technol. & Res. Lab., SRI Int., Menlo Park, CA, USA
  • fYear
    2010
  • fDate
    14-19 March 2010
  • Firstpage
    5214
  • Lastpage
    5217
  • Abstract
    Statistical machine translation (SMT) systems for spoken languages suffer from conversational speech phenomena, in particular, the presence of speech disfluencies. We examine the impact of disfluencies from broadcast conversation data on our hierarchical phrase-based SMT system and implement automatic disfluency removal approaches for cleansing the MT input. We evaluate the efficacy of proposed approaches and investigate the impact of disfluency removal on SMT performance across different disfluency types. We show that for translating Mandarin broadcast conversational transcripts into English, our automatic disfluency removal approaches could produce significant improvement in BLEU and TER.
  • Keywords
    language translation; speech processing; automatic disfluency removal; conversational speech phenomena; speech disfluencies; spoken language translation; statistical machine translation systems; Automatic speech recognition; Decoding; Error analysis; Hidden Markov models; Natural languages; Radio broadcasting; Speech analysis; Surface-mount technology; System testing; TV broadcasting; automatic disfluency detection; broadcast conversation; spoken language translation; statistical machine translation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on
  • Conference_Location
    Dallas, TX
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4244-4295-9
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2010.5494999
  • Filename
    5494999