DocumentCode :
3243103
Title :
TMAC: An automated text mining tool for construction of an annotated corpus to support protein-protein interaction information extraction
Author :
Azzem, Rania Ahmed Abdel ; Seoud, Abul
Author_Institution :
Dept. of Electr. Eng., El Fayoum Univ., Fayoum, Egypt
fYear :
2010
fDate :
2-4 Nov. 2010
Firstpage :
75
Lastpage :
79
Abstract :
Extracting protein-protein interaction (PPI) from biomedical literatures is a meaningful topic in protein science. Annotated corpora are important to the development and evaluation of protein-protein interaction extraction systems. So it is important to construct a text mining tool for the annotation of any corpus for protein name and interaction events for the identification of interactions among proteins. In this paper we present a java package called the TMAC system. TMAC tagged protein names and interaction events in biomedical literatures based on a combination of carefully designed rules and a dictionary of protein names. TMAC is able to normalize the results of protein mentions and interaction events found by offering the appropriate database reference. TMAC is divided into two modules. The first module is the Name entity identification and normalization module. The second module is the interaction event tagger for the identification of words that will ensure the occurrence of the interaction. TMAC achieved an average of 85.2% precision, 76.7% recall for the protein identification process. TMAC achieved an average of 88.2% precision, 71.8% recall for the protein - protein interaction event identification process. TMAC is a flexible system. It could be used as a standalone application or can be incorporated in the workflow of a more general text mining system.
Keywords :
biology computing; data mining; proteins; text analysis; Java package; TMAC system; annotated corpora; annotated corpus; automated text mining; biomedical literatures; protein identification; protein science; protein-protein interaction extraction systems; protein-protein interaction information extraction; text mining system; Abstracts; Databases; Dictionaries; Protein engineering; Proteins; Text mining; named entity recognition; protein normalization; text-mining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Technology and Development (ICCTD), 2010 2nd International Conference on
Conference_Location :
Cairo
Print_ISBN :
978-1-4244-8844-5
Electronic_ISBN :
978-1-4244-8845-2
Type :
conf
DOI :
10.1109/ICCTD.2010.5646069
Filename :
5646069
Link To Document :
بازگشت