DocumentCode
3142031
Title
A framework for semi-automatic identification, disambiguation and storage of protein-related abbreviations in scientific literature
Author
Atzeni, Paolo ; Polticelli, Fabio ; Toti, Daniele
Author_Institution
Dipt. di Inf. e Autom., Univ. Roma Tre, Rome, Italy
fYear
2011
fDate
11-16 April 2011
Firstpage
59
Lastpage
61
Abstract
We propose a framework for identifying, disambiguating and storing protein-related abbreviations as found in the full texts of scientific papers, in order to build and maintain a publicly available abbreviation repository via a semi-automatic process. This process involves information extraction methods and techniques for acronym identification and resolution, based on lexical clues and syntactical, largely domain-independent criteria. A dictionary and an ontology for proteins provide the means for matching and disambiguating the biological entities. User feedback is gathered at the end of the process and the confirmed entries are then stored and made available to the scientific community for further reviewing.
Keywords
biology computing; dictionaries; information retrieval; ontologies (artificial intelligence); proteins; abbreviation repository; acronym identification; acronym resolution; biological entities; dictionary; information extraction methods; lexical clues; ontology; protein-related abbreviation disambiguation; protein-related abbreviation semi-automatic identification; protein-related abbreviation storage; scientific literature; user feedback; Bioinformatics; Communities; Data mining; Dictionaries; Natural language processing; Proteins;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Engineering Workshops (ICDEW), 2011 IEEE 27th International Conference on
Conference_Location
Hannover
Print_ISBN
978-1-4244-9195-7
Electronic_ISBN
978-1-4244-9194-0
Type
conf
DOI
10.1109/ICDEW.2011.5767646
Filename
5767646
Link To Document