• DocumentCode
    53569
  • Title

    Generalized Query-Based Active Learning to Identify Differentially Methylated Regions in DNA

  • Author

    Haque, Md Mohaiminul ; Holder, Lawrence B. ; Skinner, Michael K. ; Cook, Diane J.

  • Author_Institution
    Sch. of Electr. Eng. & Comput. Sci., Washington State Univ., Pullman, WA, USA
  • Volume
    10
  • Issue
    3
  • fYear
    2013
  • fDate
    May-June 2013
  • Firstpage
    632
  • Lastpage
    644
  • Abstract
    Active learning is a supervised learning technique that reduces the number of examples required for building a successful classifier, because it can choose the data it learns from. This technique holds promise for many biological domains in which classified examples are expensive and time-consuming to obtain. Most traditional active learning methods ask very specific queries to the Oracle (e.g., a human expert) to label an unlabeled example. The example may consist of numerous features, many of which are irrelevant. Removing such features will create a shorter query with only relevant features, and it will be easier for the Oracle to answer. We propose a generalized query-based active learning (GQAL) approach that constructs generalized queries based on multiple instances. By constructing appropriately generalized queries, we can achieve higher accuracy compared to traditional active learning methods. We apply our active learning method to find differentially DNA methylated regions (DMRs). DMRs are DNA locations in the genome that are known to be involved in tissue differentiation, epigenetic regulation, and disease. We also apply our method on 13 other data sets and show that our method is better than another popular active learning technique.
  • Keywords
    DNA; biology computing; genomics; learning (artificial intelligence); query processing; DMR; DNA locations; GQAL; Oracle; biological domains; classifier; differentially methylated regions; disease; epigenetic regulation; generalized query-based active learning; genome; supervised learning technique; tissue differentiation; Accuracy; Bioinformatics; DNA; Learning systems; Training; Uncertainty; Active learning; DNA methylation; bioinformatics; generalized query;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2013.38
  • Filename
    6514874