DocumentCode
672396
Title
Unsupervised word segmentation from noisy input
Author
Heymann, Jahn ; Walter, O. ; Haeb-Umbach, Reinhold ; Raj, Bhiksha
Author_Institution
Dept. of Commun. Eng., Univ. of Paderborn, Paderborn, Germany
fYear
2013
fDate
8-12 Dec. 2013
Firstpage
458
Lastpage
463
Abstract
In this paper we present an algorithm for the unsupervised segmentation of a character or phoneme lattice into words. Using a lattice at the input rather than a single string accounts for the uncertainty of the character/phoneme recognizer about the true label sequence. An example application is the discovery of lexical units from the output of an error-prone phoneme recognizer in a zero-resource setting, where neither the lexicon nor the language model is known. Recently a Weighted Finite State Transducer (WFST) based approach has been published which we show to suffer from an issue: language model probabilities of known words are computed incorrectly. Fixing this issue leads to greatly improved precision and recall rates, however at the cost of increased computational complexity. It is therefore practical only for single input strings. To allow for a lattice input and thus for errors in the character/phoneme recognizer, we propose a computationally efficient suboptimal two-stage approach, which is shown to significantly improve the word segmentation performance compared to the earlier WFST approach.
Keywords
probability; speech recognition; unsupervised learning; word processing; character recognizer; computationally efficient suboptimal two-stage approach; error-prone phoneme recognizer; label sequence; language model probabilities; lexical unit discovery; noisy input; phoneme lattice; unsupervised word segmentation algorithm; word segmentation performance; zero-resource setting; Acoustics; Computational modeling; Context; Lattices; Probability; Speech; Transducers; Automatic speech recognition; Unsupervised learning;
fLanguage
English
Publisher
ieee
Conference_Titel
Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on
Conference_Location
Olomouc
Type
conf
DOI
10.1109/ASRU.2013.6707773
Filename
6707773
Link To Document