مرکز منطقه ای اطلاع رساني علوم و فناوري - Joint Discriminative Decoding of Words and Semantic Tags for Spoken Language Understanding

DocumentCode :

7894

Title :

Joint Discriminative Decoding of Words and Semantic Tags for Spoken Language Understanding

Author :

Deoras, A. ; Tur, Gokhan ; Sarikaya, R. ; Hakkani-Tur, Dilek

Author_Institution :

Microsoft Corp., Mountain View, CA, USA

Volume :

Issue :

fYear :

2013

fDate :

Aug. 2013

Firstpage :

1612

Lastpage :

1621

Abstract :

Most Spoken Language Understanding (SLU) systems today employ a cascade approach, where the best hypothesis from Automatic Speech Recognizer (ASR) is fed into understanding modules such as slot sequence classifiers and intent detectors. The output of these modules is then further fed into downstream components such as interpreter and/or knowledge broker. These statistical models are usually trained individually to optimize the error rate of their respective output. In such approaches, errors from one module irreversibly propagates into other modules causing a serious degradation in the overall performance of the SLU system. Thus it is desirable to jointly optimize all the statistical models together. As a first step towards this, in this paper, we propose a joint decoding framework in which we predict the optimal word as well as slot sequence (semantic tag sequence) jointly given the input acoustic stream. Furthermore, the improved recognition output is then used for an utterance classification task, specifically, we focus on intent detection task. On a SLU task, we show 1.5% absolute reduction (7.6% relative reduction) in word error rate (WER) and 1.2% absolute improvement in F measure for slot prediction when compared to a very strong cascade baseline comprising of state-of-the-art large vocabulary ASR followed by conditional random field (CRF) based slot sequence tagger. Similarly, for intent detection, we show 1.2% absolute reduction (12% relative reduction) in classification error rate.

Keywords :

decoding; natural language processing; random processes; signal classification; speech coding; speech recognition; statistical analysis; ASR; CRF; F measure; SLU systems; WER; automatic speech recognizer; classification error rate; conditional random field; error rate optimization; intent detectors; interpreter; joint discriminative semantic tag decoding; joint discriminative word decoding; knowledge broker; natural language; slot sequence classifiers; slot sequence tagger; spoken language understanding systems; statistical models; utterance classification task; word error rate; ASR; CRF; Joint Decoding; MaxEnt; SLU; lattice decoding; speech and dialog understanding; spoken language processing;

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1558-7916

Type :

jour

DOI :

10.1109/TASL.2013.2256894

Filename :

6494264

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=7894