DocumentCode
2179369
Title
Active Learning Algorithm for Threshold of Decision Probability on Imbalanced Text Classification Based on Protein-Protein Interaction Documents
Author
Xu, Guixian ; Niu, Zhendong ; Gao, Xu ; Cao, Yujuan ; Zhao, Yumin
Author_Institution
Sch. of Comput. Sci., Beijing Inst. of Technol., Beijing, China
fYear
2010
fDate
9-10 Feb. 2010
Firstpage
78
Lastpage
82
Abstract
The study of host pathogen protein-protein interactions (PPIs) is essential to understand the disease-causing mechanisms of human pathogens. A large number of scientific findings about PPIs are generated in the biomedical literatures. Building a document classification system can accelerate the process of mining and curation of PPI knowledge. With more and more imbalanced dataset appearing, how to handle the imbalanced classification problem is becoming a hot topic in machine learning field. In this paper, we propose an Active Learning algorithm for Threshold of Decision Probability (ALTDP) to solve problem of misclassifying the minority class based on imbalanced host pathogen PPIs data set. The results demonstrate the proposed approach is significant to improve the accuracy of classification on imbalanced data set.
Keywords
data mining; learning (artificial intelligence); pattern classification; proteins; active learning algorithm; decision probability threshold; document classification system; imbalanced host pathogen PPIs data set; imbalanced text classification; protein-protein interaction documents; Acceleration; Classification tree analysis; Costs; Humans; Machine learning; Machine learning algorithms; Pathogens; Protein engineering; Sampling methods; Text categorization; imbalanced text classification; machine learning; protein-protein interaction;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Storage and Data Engineering (DSDE), 2010 International Conference on
Conference_Location
Bangalore
Print_ISBN
978-1-4244-5678-9
Type
conf
DOI
10.1109/DSDE.2010.28
Filename
5452631
Link To Document