A Novel Method of Language Modeling for Automatic Captioning in TC Video Teleconferencing

Author

Zhang, Xiaojia ; Zhao, Yunxin ; Schopp, Laura

Author_Institution

Dept. of Comput. Sci., Missouri Univ., Columbia, MO

Volume

11

Issue

3

fYear

2007

fDate

5/1/2007 12:00:00 AM

Firstpage

332

Lastpage

337

Abstract

We are developing an automatic captioning system for teleconsultation video teleconferencing (TC-VTC) in telemedicine, based on large vocabulary conversational speech recognition. In TC-VTC, doctors´ speech contains a large number of infrequently used medical terms in spontaneous styles. Due to insufficiency of data, we adopted mixture language modeling, with models trained from several datasets of medical and nonmedical domains. This paper proposes novel modeling and estimation methods for the mixture language model (LM). Component LMs are trained from individual datasets, with class n-gram LMs trained from in-domain datasets and word n-gram LMs trained from out-of-domain datasets, and they are interpolated into a mixture LM. For class LMs, semantic categories are used for class definition on medical terms, names, and digits. The interpolation weights of a mixture LM are estimated by a greedy algorithm of forward weight adjustment (FWA). The proposed mixing of in-domain class LMs and out-of-domain word LMs, the semantic definitions of word classes, as well as the weight-estimation algorithm of FWA are effective on the TC-VTC task. As compared with using mixtures of word LMs with weights estimated by the conventional expectation-maximization algorithm, the proposed methods led to a 21% reduction of perplexity on test sets of five doctors, which translated into improvements of captioning accuracy

Keywords

expectation-maximisation algorithm; greedy algorithms; interpolation; linguistics; natural language processing; speech recognition; teleconferencing; telemedicine; automatic captioning system; captioning accuracy; expectation-maximization algorithm; forward weight adjustment; greedy algorithm; interpolation weights; large vocabulary conversational speech recognition; medical terms; mixture language modeling; semantic categories; semantic definitions; teleconsultation video teleconferencing; telemedicine; weight-estimation algorithm; word classes; Automatic speech recognition; Interpolation; Natural languages; Parameter estimation; Predictive models; Probability; Speech recognition; Teleconferencing; Telemedicine; Vocabulary; Automatic speech recognition; mixture language model (LM); teleconsultation (TC); telemedicine; video teleconferencing;

fLanguage

English

Journal_Title

Information Technology in Biomedicine, IEEE Transactions on

Publisher

ieee

ISSN

1089-7771

Type

jour

DOI

10.1109/TITB.2006.885549

Filename

4167905