• DocumentCode
    3631369
  • Title

    Restoring punctuation and capitalization in transcribed speech

  • Author

    Agustin Gravano;Martin Jansche;Michiel Bacchiani

  • Author_Institution
    Department of Computer Science, Columbia University, New York, 10027, USA
  • fYear
    2009
  • Firstpage
    4741
  • Lastpage
    4744
  • Abstract
    Adding punctuation and capitalization greatly improves the readability of automatic speech transcripts. We discuss an approach for performing both tasks in a single pass using a purely text-based n-gram language model. We study the effect on performance of varying the n-gram order (from n = 3 to n = 6) and the amount of training data (from 58 million to 55 billion tokens). Our results show that using larger training data sets consistently improves performance, while increasing the n-gram order does not help nearly as much.
  • Keywords
    "Training data","Automatic speech recognition","Testing","Broadcasting","Mars","Volcanoes","Natural languages","Computer science","Speech recognition","Text recognition"
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4244-2353-8
  • Electronic_ISBN
    2379-190X
  • Type

    conf

  • DOI
    10.1109/ICASSP.2009.4960690
  • Filename
    4960690