• DocumentCode
    2179369
  • Title

    Active Learning Algorithm for Threshold of Decision Probability on Imbalanced Text Classification Based on Protein-Protein Interaction Documents

  • Author

    Xu, Guixian ; Niu, Zhendong ; Gao, Xu ; Cao, Yujuan ; Zhao, Yumin

  • Author_Institution
    Sch. of Comput. Sci., Beijing Inst. of Technol., Beijing, China
  • fYear
    2010
  • fDate
    9-10 Feb. 2010
  • Firstpage
    78
  • Lastpage
    82
  • Abstract
    The study of host pathogen protein-protein interactions (PPIs) is essential to understand the disease-causing mechanisms of human pathogens. A large number of scientific findings about PPIs are generated in the biomedical literatures. Building a document classification system can accelerate the process of mining and curation of PPI knowledge. With more and more imbalanced dataset appearing, how to handle the imbalanced classification problem is becoming a hot topic in machine learning field. In this paper, we propose an Active Learning algorithm for Threshold of Decision Probability (ALTDP) to solve problem of misclassifying the minority class based on imbalanced host pathogen PPIs data set. The results demonstrate the proposed approach is significant to improve the accuracy of classification on imbalanced data set.
  • Keywords
    data mining; learning (artificial intelligence); pattern classification; proteins; active learning algorithm; decision probability threshold; document classification system; imbalanced host pathogen PPIs data set; imbalanced text classification; protein-protein interaction documents; Acceleration; Classification tree analysis; Costs; Humans; Machine learning; Machine learning algorithms; Pathogens; Protein engineering; Sampling methods; Text categorization; imbalanced text classification; machine learning; protein-protein interaction;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Storage and Data Engineering (DSDE), 2010 International Conference on
  • Conference_Location
    Bangalore
  • Print_ISBN
    978-1-4244-5678-9
  • Type

    conf

  • DOI
    10.1109/DSDE.2010.28
  • Filename
    5452631