DocumentCode :
3601174
Title :
Extracting Various Classes of Data From Biological Text Using the Concept of Existence Dependency
Author :
Taha, Kamal
Author_Institution :
Electr. & Comput. Eng. Dept., Khalifa Univ., Abu Dhabi, United Arab Emirates
Volume :
19
Issue :
6
fYear :
2015
Firstpage :
1918
Lastpage :
1928
Abstract :
One of the key goals of biological natural language processing (NLP) is the automatic information extraction from biomedical publications. Most current constituency and dependency parsers overlook the semantic relationships between the constituents comprising a sentence and may not be well suited for capturing complex long-distance dependences. We propose in this paper a hybrid constituency-dependency parser for biological NLP information extraction called EDCC. EDCC aims at enhancing the state of the art of biological text mining by applying novel linguistic computational techniques that overcome the limitations of current constituency and dependency parsers outlined earlier, as follows: 1) it determines the semantic relationship between each pair of constituents in a sentence using novel semantic rules; and 2) it applies a semantic relationship extraction model that extracts information from different structural forms of constituents in sentences. EDCC can be used to extract different types of data from biological texts for purposes such as protein function prediction, genetic network construction, and protein-protein interaction detection. We evaluated the quality of EDCC by comparing it experimentally with six systems. Results showed marked improvement.
Keywords :
bioinformatics; data mining; feature extraction; genetics; molecular biophysics; natural language processing; proteins; text detection; EDC_EDC; automatic information extraction; biological NLP information extraction; biological natural language processing; biological text mining; biomedical publications; complex long-distance dependences; data extraction; existence dependency; genetic network construction; hybrid constituency-dependency parser; linguistic computational techniques; protein function prediction; protein-protein interaction detection; semantic relationship extraction model; Information retrieval; Natural language processing; Semantics; Text mining; Text mining; biological NLP; biological natural language processing (NLP); biomedical literature; dependency parsers; information extraction;
fLanguage :
English
Journal_Title :
Biomedical and Health Informatics, IEEE Journal of
Publisher :
ieee
ISSN :
2168-2194
Type :
jour
DOI :
10.1109/JBHI.2015.2392786
Filename :
7014223
Link To Document :
بازگشت