• DocumentCode
    3531263
  • Title

    A factor automaton approach for the forced alignment of long speech recordings

  • Author

    Moreno, Pedro J. ; Alberti, Christopher

  • Author_Institution
    Speech Res. Group, Google Inc., New York, NY
  • fYear
    2009
  • fDate
    19-24 April 2009
  • Firstpage
    4869
  • Lastpage
    4872
  • Abstract
    This paper addresses the problem of aligning long speech recordings to their transcripts. Previous work has focused on using highly tuned language models trained on the transcripts to reduce the search space. In this paper we propose the use of a factor automaton, a well known method to represent all substrings from a string. This automaton encodes a highly constrained language model trained on the transcripts. We show competitive results with n-gram models in several testing scenarios. Preliminary experiments show perfect alignments at a reduced computational load and with a smaller memory footprint when compared to n-gram models.
  • Keywords
    automata theory; learning (artificial intelligence); speech coding; constrained language model; encoding; factor automaton approach; long speech forced recording alignment; transcript; Automata; Data mining; Dictionaries; Indexing; Natural languages; Search engines; Sequences; Speech recognition; Video sharing; Vocabulary; finite state transducers; speech alignment; speech recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on
  • Conference_Location
    Taipei
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4244-2353-8
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2009.4960722
  • Filename
    4960722