DocumentCode
3631369
Title
Restoring punctuation and capitalization in transcribed speech
Author
Agustin Gravano;Martin Jansche;Michiel Bacchiani
Author_Institution
Department of Computer Science, Columbia University, New York, 10027, USA
fYear
2009
Firstpage
4741
Lastpage
4744
Abstract
Adding punctuation and capitalization greatly improves the readability of automatic speech transcripts. We discuss an approach for performing both tasks in a single pass using a purely text-based n-gram language model. We study the effect on performance of varying the n-gram order (from n = 3 to n = 6) and the amount of training data (from 58 million to 55 billion tokens). Our results show that using larger training data sets consistently improves performance, while increasing the n-gram order does not help nearly as much.
Keywords
"Training data","Automatic speech recognition","Testing","Broadcasting","Mars","Volcanoes","Natural languages","Computer science","Speech recognition","Text recognition"
Publisher
ieee
Conference_Titel
Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on
ISSN
1520-6149
Print_ISBN
978-1-4244-2353-8
Electronic_ISBN
2379-190X
Type
conf
DOI
10.1109/ICASSP.2009.4960690
Filename
4960690
Link To Document