• DocumentCode
    3744816
  • Title

    Different word representations and their combination for proper name retrieval from diachronic documents

  • Author

    Irina Illina;Dominique Fohr

  • Author_Institution
    MultiSpeech team, Universit? de Lorraine, LORIA, UMR 7503, Vandoeuvre-l?s-Nancy, F-54506, France, Inria, Villers-l?s-Nancy, F-54600, France CNRS, LORIA, UMR 7503, Vandoeuvre-l?s-Nancy, F-54506, France
  • fYear
    2015
  • Firstpage
    1
  • Lastpage
    7
  • Abstract
    This paper deals with the problem of high-quality transcription systems for very large vocabulary automatic speech recognition (ASR). We investigate the problem of automatic retrieval of out-of-vocabulary (OOV) proper names (PNs). We want to take into account the temporal, syntactic and semantic context of words. Nowadays, Artificial Neural Networks (NN) are widely used in natural language processing: continuous space representations of words is learned automatically from unstructured text data. To model the latent topics at document level, Latent Dirichlet Allocation (LDA) has been successful. In this paper, we propose OOV PN retrieval using (1) temporal versus topic context modeling; (2) different word representation spaces for word-level and document-level context modeling; (3) combinations of retrieval results. Experimental evaluation on broadcast news data shows that the proposed method combinations lead to better results. This confirms the complementarity of methods.
  • Keywords
    "Vocabulary","Semantics","Context","Context modeling","Artificial neural networks","Measurement"
  • Publisher
    ieee
  • Conference_Titel
    Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on
  • Type

    conf

  • DOI
    10.1109/ASRU.2015.7404766
  • Filename
    7404766