A framework for semi-automatic identification, disambiguation and storage of protein-related abbreviations in scientific literature

Author

Atzeni, Paolo ; Polticelli, Fabio ; Toti, Daniele

Author_Institution

Dipt. di Inf. e Autom., Univ. Roma Tre, Rome, Italy

fYear

2011

fDate

11-16 April 2011

Firstpage

59

Lastpage

61

Abstract

We propose a framework for identifying, disambiguating and storing protein-related abbreviations as found in the full texts of scientific papers, in order to build and maintain a publicly available abbreviation repository via a semi-automatic process. This process involves information extraction methods and techniques for acronym identification and resolution, based on lexical clues and syntactical, largely domain-independent criteria. A dictionary and an ontology for proteins provide the means for matching and disambiguating the biological entities. User feedback is gathered at the end of the process and the confirmed entries are then stored and made available to the scientific community for further reviewing.

Keywords

biology computing; dictionaries; information retrieval; ontologies (artificial intelligence); proteins; abbreviation repository; acronym identification; acronym resolution; biological entities; dictionary; information extraction methods; lexical clues; ontology; protein-related abbreviation disambiguation; protein-related abbreviation semi-automatic identification; protein-related abbreviation storage; scientific literature; user feedback; Bioinformatics; Communities; Data mining; Dictionaries; Natural language processing; Proteins;

fLanguage

English

Publisher

ieee

Conference_Titel

Data Engineering Workshops (ICDEW), 2011 IEEE 27th International Conference on

Conference_Location

Hannover

Print_ISBN

978-1-4244-9195-7

Electronic_ISBN

978-1-4244-9194-0

Type

conf

DOI

10.1109/ICDEW.2011.5767646

Filename

5767646