DocumentCode
730817
Title
Improving n-gram probability estimates by compound-head clustering
Author
Pelemans, Joris ; Demuynck, Kris ; Van hamme, Hugo ; Wambacq, Patrick
Author_Institution
Dept. ESAT, Katholieke Univ. Leuven, Leuven, Belgium
fYear
2015
fDate
19-24 April 2015
Firstpage
5221
Lastpage
5225
Abstract
Compounding is one of the most productive word formation processes in many languages and is therefore a main source of data sparsity in language modeling. Many solutions have been suggested to model compound words, most of which break the compound into its constituents and train a new model with them. In earlier work, we argued that this approach is suboptimal and we presented a novel technique that clusters new, domain-specific compound words together with their semantic heads. The clusters were then used to build a class-based n-gram model that enabled a reliable estimation of n-gram probabilities, without the need for additional training data. In this paper, we investigate how this “semantic head mapping” can best be made an integral part of the language modeling strategy and find that, with some adaptations, our technique is capable of producing more accurate compound probability estimates than a baseline word-based n-gram language model, which lead to a significant word error rate reduction for Dutch read speech.
Keywords
pattern clustering; probability; semantic networks; speech recognition; Dutch read speech; class-based n-gram model; compound probability estimates; compounding; data sparsity; domain-specific compound words; language modeling; n-gram probabilities; semantic head mapping; word error rate reduction; Estimation; Pragmatics; Semantics; Training data; LVCSR; data sparsity; language models; n-grams; word clusters;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
Conference_Location
South Brisbane, QLD
Type
conf
DOI
10.1109/ICASSP.2015.7178967
Filename
7178967
Link To Document