• DocumentCode
    1754632
  • Title

    Improved and Promising Identification of Human MicroRNAs by Incorporating a High-Quality Negative Set

  • Author

    Leyi Wei ; Minghong Liao ; Yue Gao ; Rongrong Ji ; Zengyou He ; Quan Zou

  • Author_Institution
    Sch. of Inf. Sci. & Technol., Xiamen Univ., Xiamen, China
  • Volume
    11
  • Issue
    1
  • fYear
    2014
  • fDate
    Jan.-Feb. 2014
  • Firstpage
    192
  • Lastpage
    201
  • Abstract
    MicroRNA (miRNA) plays an important role as a regulator in biological processes. Identification of (pre-) miRNAs helps in understanding regulatory processes. Machine learning methods have been designed for pre-miRNA identification. However, most of them cannot provide reliable predictive performances on independent testing data sets. We assumed this is because the training sets, especially the negative training sets, are not sufficiently representative. To generate a representative negative set, we proposed a novel negative sample selection technique, and successfully collected negative samples with improved quality. Two recent classifiers rebuilt with the proposed negative set achieved an improvement of ~6 percent in their predictive performance, which confirmed this assumption. Based on the proposed negative set, we constructed a training set, and developed an online system called miRNApre specifically for human pre-miRNA identification. We showed that miRNApre achieved accuracies on updated human and non-human data sets that were 34.3 and 7.6 percent higher than those achieved by current methods. The results suggest that miRNApre is an effective tool for pre-miRNA identification. Additionally, by integrating miRNApre, we developed a miRNA mining tool, mirnaDetect, which can be applied to find potential miRNAs in genome-scale data. MirnaDetect achieved a comparable mining performance on human chromosome 19 data as other existing methods.
  • Keywords
    RNA; biochemistry; bioinformatics; classification; data mining; genomics; information services; learning (artificial intelligence); molecular biophysics; MirnaDetect mining performance; biological process regulator; classifier predictive performance; genome-scale data; high-quality negative training set incorporation; human chromosome 19 data; human microRNA identification; human pre-miRNA identification; independent testing data sets; machine learning methods; miRNA mining tool; miRNApre accuracy; mirnaDetect; negative sample collection; negative sample quality; negative sample selection technique; nonhuman data sets; online system; representative negative set generation; training set construction; Biological processes; Data mining; Genetics; Machine learning; RNA; MicroRNA; high-quality negative set; microRNA identification; multi-level negative sample selection;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2013.146
  • Filename
    6661313