• DocumentCode
    3636202
  • Title

    Automatic sentence boundary detection in conversational speech: A cross-lingual evaluation on English and Czech

  • Author

    J?chym Kol?;Yang Liu

  • Author_Institution
    Faculty of Applied Sciences, Dept. of Cybernetics, Univ. of West Bohemia in Pilsen, Czech Republic
  • fYear
    2010
  • Firstpage
    5258
  • Lastpage
    5261
  • Abstract
    Automatic sentence segmentation of speech is important for enriching speech recognition output and aiding downstream language processing. This paper focuses on automatic sentence segmentation of speech in two different languages - English and Czech. For this task, we compare and combine three statistical models - HMM, maximum entropy, and a boosting-based model BoosTexter. All these approaches rely on both textual and prosodic information. We evaluate these methods on a corpus of multiparty meetings in English, and on a corpus of broadcast conversations in Czech, using both manual and speech recognition transcripts. The experiments show that superior results are achieved when all the three models are combined via posterior probability interpolation. We observe differences in terms of model performance between English and Czech, as well as the feature usage difference in prosodic models between the two languages. Overall, the analysis is important for porting sentence segmentation approaches from one language to another.
  • Keywords
    "Speech analysis","Hidden Markov models","Natural languages","Speech recognition","Speech processing","Entropy","Broadcasting","Interpolation","Morphology","Cybernetics"
  • Publisher
    ieee
  • Conference_Titel
    Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4244-4295-9
  • Electronic_ISBN
    2379-190X
  • Type

    conf

  • DOI
    10.1109/ICASSP.2010.5494976
  • Filename
    5494976