• DocumentCode
    3530609
  • Title

    Syntactically-informed models for comma prediction

  • Author

    Favre, Benoit ; Hakkani-Tür, Dilek ; Shriberg, Elizabeth

  • Author_Institution
    Int. Comput. Sci. Inst., Berkeley, CA
  • fYear
    2009
  • fDate
    19-24 April 2009
  • Firstpage
    4697
  • Lastpage
    4700
  • Abstract
    Providing punctuation in speech transcripts not only improves readability, but it also helps downstream text processing such as information extraction or machine translation. In this paper, we improve by 7% the accuracy of comma prediction in English broadcast news by introducing syntactic features inspired by the role of commas as described in linguistics studies. We conduct an analysis of the impact of those features on other subsets of features (prosody, words...) when combined through CRFs. The syntactic cues can help characterizing large syntactic patterns such as appositions and lists which are not necessarily marked by prosody.
  • Keywords
    linguistics; natural language processing; speech recognition; text analysis; English broadcast news; automatic speech recognition systems; downstream text processing; information extraction; linguistics; machine translation; speech transcription; Boosting; Broadcasting; Classification tree analysis; Computer science; Data mining; Decision trees; Neural networks; Predictive models; Speech processing; Testing; Machine Learning; Punctuation; Speech Processing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on
  • Conference_Location
    Taipei
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4244-2353-8
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2009.4960679
  • Filename
    4960679