• DocumentCode
    1787085
  • Title

    Using topic models in domain adaptation

  • Author

    Zahabi, Samira Tofighi ; Bakhshaei, Somayeh ; Khadivi, Shahram

  • Author_Institution
    HLT Lab., Amirkabir Univ. of Technol., Tehran, Iran
  • fYear
    2014
  • fDate
    9-11 Sept. 2014
  • Firstpage
    539
  • Lastpage
    543
  • Abstract
    An important factor of a corpus is its domain, usually the quality of a SMT system trained on an in-domain corpus increases by adding out-of-domain sentences to its training corpus. In this paper we have shown out-of-domain corpora may also contains sentences which are proper for improving the quality of in-domain corpus. These sentences have words and phrases that occur in indomain corpora so, their context is more similar to the context of in-domain parallel corpus and is far from context of out-of-domain parallel corpora. In this paper we suggest a method based on topic models to extract some sentences from out-of-domain parallel corpora that their context are similar to indomain parallel corpus. We used these extracted sentences for training an SMT system. Finally, we will show the BLEU score of the system output increases about 4.69% by adding these extra information to its training corpus.
  • Keywords
    language translation; natural language processing; BLEU score; SMT system; in-domain parallel corpus; natural language processing; out-of-domain sentences; sentence extraction; statistical machine translation; topic models; Adaptation models; Computational modeling; Context; Context modeling; Equations; Mathematical model; Training; Natural Language Processing; Topic Model; Translation Model; domain adaptation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Telecommunications (IST), 2014 7th International Symposium on
  • Conference_Location
    Tehran
  • Print_ISBN
    978-1-4799-5358-5
  • Type

    conf

  • DOI
    10.1109/ISTEL.2014.7000763
  • Filename
    7000763