Title :
Smoothlm: A language model compression library
Author :
Akin, Ahmet Afsin ; Demir, Cemil
Author_Institution :
BILGEM-BTE, Konusma ve Dogal Dil Isleme Teknol. Lab., TUBITAK, Kocaeli, Turkey
Abstract :
In this paper we will present SmoothLm, a language model compression and random access library. Like some other previous work, this library uses Minimal Perfect Hash Functions (MPHF) to reach high compression rates. We improved a previous MPHF algorithm in terms of generation and query speed and named it Multi Level MPHF. We also present a mechanism that use this MPHF structure on very large data sets quickly with limited memory usage. SmoothLm´s generates lossy models and it provides a quantization mechanism for probability values for extra compression. We use SmoothLm in our in house speech recognition engine and our experiments showed that with correct parameters, being a lossy model or applying quantization does not hurt performance. Library is proper for applications developed in Java and source code is available with a free license.
Keywords :
Java; probability; query processing; software libraries; source code (software); speech coding; speech recognition; storage management; Java; MPHF algorithm; SmoothLm; compression rates; house speech recognition engine; language model compression library; lossy models; memory usage; minimal perfect hash functions; multilevel MPHF structure; probability values; quantization mechanism; query speed; random access library; source code; very large data sets; Computational modeling; Conferences; Java; Libraries; Quantization (signal); Speech recognition; compression; language modelling; minimal perfect hash functions; speech recognition;
Conference_Titel :
Signal Processing and Communications Applications Conference (SIU), 2014 22nd
Conference_Location :
Trabzon
DOI :
10.1109/SIU.2014.6830609