Title :
Semi-supervised Learning of the Hidden Vector State Model for Protein-Protein Interactions Extraction
Author :
Zhou, Deyu ; He, Yulan ; Kwoh, Chee Keong
Author_Institution :
Sch. of Comput. Eng., Nanyang Technol. Univ., Singapore
fDate :
March 1 2007-April 5 2007
Abstract :
A major challenge in text mining for biology and biomedicine is automatically extracting protein-protein interactions from the vast amount of biological literature since most knowledge about them still hides in biological publications. Existing approaches can be broadly categorized as rule-based or statistical-based. Rule-based approaches require heavy manual efforts. On the other hand, statistical-based approaches require large-scale, richly annotated corpora in order to reliably estimate model parameters. This is normally difficult to obtain in practical applications. The hidden vector state (HVS) model, an extension of the basic discrete Markov model, has been successfully applied to extract protein-protein interactions. In this paper, we propose a novel approach to train the HVS model on both annotated and un-annotated corpus. Sentences selection algorithm is designed to utilize the semantic parsing results of the un-annotated corpus generated by the HVS model. Experimental results show that the performance of the initial HVS model trained on a small amount of the annotated data can be improved by employing this approach
Keywords :
biology computing; data mining; learning (artificial intelligence); medical information systems; text analysis; biological literature; biological publications; biology text mining; biomedicine text mining; discrete Markov model; hidden vector state model; protein-protein interactions extraction; semantic parsing; semisupervised learning; Biological system modeling; Biology computing; Computational intelligence; Data mining; Helium; Hidden Markov models; Large-scale systems; Pattern matching; Protein engineering; Semisupervised learning; hidden vector state model; information extraction; protein-protein interactions extraction; semi-supervised learning;
Conference_Titel :
Computational Intelligence and Data Mining, 2007. CIDM 2007. IEEE Symposium on
Conference_Location :
Honolulu, HI
Print_ISBN :
1-4244-0705-2
DOI :
10.1109/CIDM.2007.368941