DocumentCode :
1504519
Title :
Classification of Protein-Protein Interaction Full-Text Documents Using Text and Citation Network Features
Author :
Kolchinsky, Artemy ; Abi-Haidar, Alaa ; Kaur, Jasleen ; Hamed, Ahmed Abdeen ; Rocha, Luis M.
Author_Institution :
Sch. of Inf. & Comput., Indiana Univ., Bloomington, IN, USA
Volume :
7
Issue :
3
fYear :
2010
Firstpage :
400
Lastpage :
411
Abstract :
We participated (as Team 9) in the Article Classification Task of the Biocreative II.5 Challenge: binary classification of full-text documents relevant for protein-protein interaction. We used two distinct classifiers for the online and offline challenges: 1) the lightweight Variable Trigonometric Threshold (VTT) linear classifier we successfully introduced in BioCreative 2 for binary classification of abstracts and 2) a novel Naive Bayes classifier using features from the citation network of the relevant literature. We supplemented the supplied training data with full-text documents from the MIPS database. The lightweight VTT classifier was very competitive in this new full-text scenario: it was a top-performing submission in this task, taking into account the rank product of the Area Under the interpolated precision and recall Curve, Accuracy, Balanced F-Score, and Matthew´s Correlation Coefficient performance measures. The novel citation network classifier for the biomedical text mining domain, while not a top performing classifier in the challenge, performed above the central tendency of all submissions, and therefore indicates a promising new avenue to investigate further in bibliome informatics.
Keywords :
bioinformatics; pattern classification; proteins; text analysis; Biocreative II.5 challenge; accuracy measurement; area-under-the-curve measurement; balanced f-score measurement; bibliome informatics; citation network features; correlation coefficient measurement; full-text documents; naive Bayes classifier; protein-protein interaction classification; text network features; variable trigonometric threshold linear classifier; Text mining; binary classification; citation network.; literature mining; protein-protein interaction; Abstracting and Indexing as Topic; Algorithms; Computational Biology; Data Mining; Databases, Bibliographic; Neural Networks (Computer); Periodicals as Topic; Protein Interaction Mapping;
fLanguage :
English
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
1545-5963
Type :
jour
DOI :
10.1109/TCBB.2010.55
Filename :
5473214
Link To Document :
بازگشت