• DocumentCode
    652082
  • Title

    Automatic Information Extraction in the Medical Domain by Cross-Lingual Projection

  • Author

    Ben Abacha, Asma ; Zweigenbaum, Pierre ; Max, Aurelien

  • Author_Institution
    Ressource Centre for Health Care Technol., Centre de Rech. Public Henri Tudor, Luxembourg, Luxembourg
  • fYear
    2013
  • fDate
    9-11 Sept. 2013
  • Firstpage
    82
  • Lastpage
    88
  • Abstract
    This research tackles the automatic annotation of texts written in a language L1 by exploiting resources and tools available for another language L2. Our approach involves the use of a parallel corpus (L1-L2) aligned at the level of sentences and words. To address the lack of annotated French corpus in the medical field, we focus on the French-English language pair to annotate French medical texts automatically. We focus in this article on Medical Entity Recognition (MER). We evaluate our MER method on the English corpus and the projection of the annotations on the French corpus. We also discuss the problem of scalability since we use a parallel corpus extracted from the Web and propose a statistical method to handle heterogeneous corpora.
  • Keywords
    medical computing; natural language processing; statistical analysis; text analysis; English corpus; French medical texts; French-English language; MER method; annotated French corpus; automatic annotation; automatic information extraction; cross lingual projection; medical domain; medical entity recognition; medical field; parallel corpus; statistical method; Diseases; Feature extraction; Information retrieval; Manuals; Medical diagnostic imaging; Semantics; cross-lingual projection; medical entity recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Healthcare Informatics (ICHI), 2013 IEEE International Conference on
  • Conference_Location
    Philadelphia, PA
  • Type

    conf

  • DOI
    10.1109/ICHI.2013.25
  • Filename
    6680464