• DocumentCode
    397262
  • Title

    A probabilistic model for identifying protein names and their name boundaries

  • Author

    Seki, Kazuhiro ; Mostafa, Javed

  • Author_Institution
    Lab. of Appl. Inf. Res., Indiana Univ., Bloomington, IN, USA
  • fYear
    2003
  • fDate
    11-14 Aug. 2003
  • Firstpage
    251
  • Lastpage
    258
  • Abstract
    This paper proposes a method for identifying protein names in biomedical texts with an emphasis on detecting protein name boundaries. We use a probabilistic model which exploits several surface clues characterizing protein names and incorporates word classes for generalization. In contrast to previously proposed methods, our approach does not rely on natural language processing tools such as part-of-speech taggers and syntactic parsers, so as to reduce processing overhead and the potential number of probabilistic parameters to be estimated. A notion of certainty is also proposed to improve precision for identification. We implemented a protein name identification system based on our proposed method, and evaluated the system on real-world biomedical texts in conjunction with the previous work. The results showed that overall our system performs comparably to the state-of-the-art protein name identification system and that higher performance is achieved for compound names. In addition, it is demonstrated that our system can further improve precision by restricting the system output to those names with high certainties.
  • Keywords
    biology computing; data mining; identification; probability; proteins; compound names; probabilistic model; probabilistic parameters; processing overhead; protein name boundaries detection; real-world biomedical texts; state-of-the-art protein name identification system; surface clues; word classes incorporation; Biomedical informatics; Cancer; Data mining; Dictionaries; Laboratories; Natural language processing; Parameter estimation; Protein engineering; System performance; System testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics Conference, 2003. CSB 2003. Proceedings of the 2003 IEEE
  • Print_ISBN
    0-7695-2000-6
  • Type

    conf

  • DOI
    10.1109/CSB.2003.1227325
  • Filename
    1227325