مرکز منطقه ای اطلاع رساني علوم و فناوري - THUEE language modeling method for the OpenKWS 2015 evaluation

DocumentCode :

3738571

Title :

THUEE language modeling method for the OpenKWS 2015 evaluation

Author :

Zhuo Zhang;Wei-Qiang Zhang;Kai-Xiang Shen;Xu-Kui Yang;Yao Tian;Meng Cai;Jia Liu

Author_Institution :

Tsinghua National Laboratory for Information Science and Technology Department of Electronic Engineering, Tsinghua University, Beijing 100084

fYear :

2015

Firstpage :

534

Lastpage :

538

Abstract :

In this paper, we describe the THUEE (Department of Electronic Engineering, Tsinghua University) team´s method of building language models (LMs) for the OpenKWS 2015 Evaluation held by the National Institute of Standards and Technology (NIST). Due to the very limited in-domain data provided by NIST, it takes most of our time and efforts to make good use of the out-of-domain data. There are three main steps in our work. Firstly, data cleaning has been done on the out-of-domain data. Secondly, by comparing the cross-entropy difference between the in-domain data and out-of-domain data, a part of the out-of-domain corpus which is well-matched to the in-domain one has been selected as the training corpus. Thirdly, the final n-gram LM is obtained by interpolating individual n-gram LMs according to different training corpus and all the training data is further combined to train one feed-forward neural network LM (FNNLM). In this way, we reduce the perplexity on development test data by 8.3% for n-gram LM and 1.7% for FNNLM, and the Actual Term-Weighted Value (ATWV) of the final result is 0.5391.

Keywords :

"Training","Data models","Training data","NIST","Interpolation","Buildings","Cleaning"

Publisher :

ieee

Conference_Titel :

Signal Processing and Information Technology (ISSPIT), 2015 IEEE International Symposium on

Type :

conf

DOI :

10.1109/ISSPIT.2015.7394394

Filename :

7394394

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3738571