Automatic Information Extraction in the Medical Domain by Cross-Lingual Projection

Author

Ben Abacha, Asma ; Zweigenbaum, Pierre ; Max, Aurelien

Author_Institution

Ressource Centre for Health Care Technol., Centre de Rech. Public Henri Tudor, Luxembourg, Luxembourg

fYear

2013

fDate

9-11 Sept. 2013

Firstpage

82

Lastpage

88

Abstract

This research tackles the automatic annotation of texts written in a language L1 by exploiting resources and tools available for another language L2. Our approach involves the use of a parallel corpus (L1-L2) aligned at the level of sentences and words. To address the lack of annotated French corpus in the medical field, we focus on the French-English language pair to annotate French medical texts automatically. We focus in this article on Medical Entity Recognition (MER). We evaluate our MER method on the English corpus and the projection of the annotations on the French corpus. We also discuss the problem of scalability since we use a parallel corpus extracted from the Web and propose a statistical method to handle heterogeneous corpora.

Keywords

medical computing; natural language processing; statistical analysis; text analysis; English corpus; French medical texts; French-English language; MER method; annotated French corpus; automatic annotation; automatic information extraction; cross lingual projection; medical domain; medical entity recognition; medical field; parallel corpus; statistical method; Diseases; Feature extraction; Information retrieval; Manuals; Medical diagnostic imaging; Semantics; cross-lingual projection; medical entity recognition;

fLanguage

English

Publisher

ieee

Conference_Titel

Healthcare Informatics (ICHI), 2013 IEEE International Conference on

Conference_Location

Philadelphia, PA

Type

conf

DOI

10.1109/ICHI.2013.25

Filename

6680464