• DocumentCode
    819400
  • Title

    A Novel Method of Language Modeling for Automatic Captioning in TC Video Teleconferencing

  • Author

    Zhang, Xiaojia ; Zhao, Yunxin ; Schopp, Laura

  • Author_Institution
    Dept. of Comput. Sci., Missouri Univ., Columbia, MO
  • Volume
    11
  • Issue
    3
  • fYear
    2007
  • fDate
    5/1/2007 12:00:00 AM
  • Firstpage
    332
  • Lastpage
    337
  • Abstract
    We are developing an automatic captioning system for teleconsultation video teleconferencing (TC-VTC) in telemedicine, based on large vocabulary conversational speech recognition. In TC-VTC, doctors´ speech contains a large number of infrequently used medical terms in spontaneous styles. Due to insufficiency of data, we adopted mixture language modeling, with models trained from several datasets of medical and nonmedical domains. This paper proposes novel modeling and estimation methods for the mixture language model (LM). Component LMs are trained from individual datasets, with class n-gram LMs trained from in-domain datasets and word n-gram LMs trained from out-of-domain datasets, and they are interpolated into a mixture LM. For class LMs, semantic categories are used for class definition on medical terms, names, and digits. The interpolation weights of a mixture LM are estimated by a greedy algorithm of forward weight adjustment (FWA). The proposed mixing of in-domain class LMs and out-of-domain word LMs, the semantic definitions of word classes, as well as the weight-estimation algorithm of FWA are effective on the TC-VTC task. As compared with using mixtures of word LMs with weights estimated by the conventional expectation-maximization algorithm, the proposed methods led to a 21% reduction of perplexity on test sets of five doctors, which translated into improvements of captioning accuracy
  • Keywords
    expectation-maximisation algorithm; greedy algorithms; interpolation; linguistics; natural language processing; speech recognition; teleconferencing; telemedicine; automatic captioning system; captioning accuracy; expectation-maximization algorithm; forward weight adjustment; greedy algorithm; interpolation weights; large vocabulary conversational speech recognition; medical terms; mixture language modeling; semantic categories; semantic definitions; teleconsultation video teleconferencing; telemedicine; weight-estimation algorithm; word classes; Automatic speech recognition; Interpolation; Natural languages; Parameter estimation; Predictive models; Probability; Speech recognition; Teleconferencing; Telemedicine; Vocabulary; Automatic speech recognition; mixture language model (LM); teleconsultation (TC); telemedicine; video teleconferencing;
  • fLanguage
    English
  • Journal_Title
    Information Technology in Biomedicine, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1089-7771
  • Type

    jour

  • DOI
    10.1109/TITB.2006.885549
  • Filename
    4167905