• DocumentCode
    1796347
  • Title

    BioHCDP: A Hybrid Constituency-Dependency Parser for Biological NLP information extraction

  • Author

    Taha, Kamal ; Al Zaabi, Mohammed

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Khalifa Univ., Abu Dhabi, United Arab Emirates
  • fYear
    2014
  • fDate
    9-12 Dec. 2014
  • Firstpage
    78
  • Lastpage
    85
  • Abstract
    One of the key goals of biological Natural Language Processing (NLP) is the automatic information extraction from biomedical publications. Most current constituency and dependency parsers overlook the semantic relationships between the constituents comprising a sentence and may not be well suited for capturing complex long-distance dependencies. We propose in this paper a hybrid constituency-dependency parser for biological NLP information extraction called BioHCDP. BioHCDP aims at enhancing the state of the art of biological text mining by applying novel linguistic computational techniques that overcome the limitations of current constituency and dependency parsers outlined above, as follows: (1) it determines the semantic relationship between each pair of constituents in a sentence using novel semantic rules, and (2) it applies semantic relationship extraction models that represent the relationships of different patterns of usage in different contexts. BioHCDP can be used to extract various classes of data from biological texts, including protein function assignments, genetic networks, and protein-protein interactions. We compared BioHCDP experimentally with three systems. Results showed marked improvement.
  • Keywords
    bioinformatics; data mining; genetics; natural language processing; proteins; text analysis; BioHCDP; automatic information extraction; biological NLP information extraction; biological natural language processing; biological text mining; biomedical publications; complex long-distance dependencies; genetic networks; hybrid constituency-dependency parser; linguistic computational techniques; protein function assignments; protein-protein interactions; semantic relationship extraction models; semantic relationships; Abstracts; Protein engineering; Proteins; Semantics; Text mining; biological NLP; biomedical literature; dependency parsers; information extraction;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Intelligence and Data Mining (CIDM), 2014 IEEE Symposium on
  • Conference_Location
    Orlando, FL
  • Type

    conf

  • DOI
    10.1109/CIDM.2014.7008151
  • Filename
    7008151