DocumentCode
3531263
Title
A factor automaton approach for the forced alignment of long speech recordings
Author
Moreno, Pedro J. ; Alberti, Christopher
Author_Institution
Speech Res. Group, Google Inc., New York, NY
fYear
2009
fDate
19-24 April 2009
Firstpage
4869
Lastpage
4872
Abstract
This paper addresses the problem of aligning long speech recordings to their transcripts. Previous work has focused on using highly tuned language models trained on the transcripts to reduce the search space. In this paper we propose the use of a factor automaton, a well known method to represent all substrings from a string. This automaton encodes a highly constrained language model trained on the transcripts. We show competitive results with n-gram models in several testing scenarios. Preliminary experiments show perfect alignments at a reduced computational load and with a smaller memory footprint when compared to n-gram models.
Keywords
automata theory; learning (artificial intelligence); speech coding; constrained language model; encoding; factor automaton approach; long speech forced recording alignment; transcript; Automata; Data mining; Dictionaries; Indexing; Natural languages; Search engines; Sequences; Speech recognition; Video sharing; Vocabulary; finite state transducers; speech alignment; speech recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on
Conference_Location
Taipei
ISSN
1520-6149
Print_ISBN
978-1-4244-2353-8
Electronic_ISBN
1520-6149
Type
conf
DOI
10.1109/ICASSP.2009.4960722
Filename
4960722
Link To Document