• DocumentCode
    570220
  • Title

    An extensive comparison of feature ranking aggregation techniques in bioinformatics

  • Author

    Wald, Randall ; Khoshgoftaar, Taghi M. ; Dittman, David ; Awada, Wael ; Napolitano, Amri

  • fYear
    2012
  • fDate
    8-10 Aug. 2012
  • Firstpage
    377
  • Lastpage
    384
  • Abstract
    Univariate feature rankers have been frequently used to order genes (features) in terms of their importance to a given bioinformatics challenge. Unfortunately, the resulting feature subsets tend to differ when applied to related (but distinct) datasets, or when applied to datasets which have been varied or corrupted in some fashion. As a result, a research focus has recently been on methods to measure or improve the stability of these feature subsets. One such method is called rank aggregation. Rank aggregation is the process of combining the information from several ranked lists (or in this case ordered gene lists) into a single more stable list. While there has been work on the creation of these methods, very little work has gone into comparing the lists generated by these techniques. Such a comparison allows for grouping the techniques into families, both for understanding how the families affect rank aggregation and for using less-computationally-expensive members of a given family. This paper is an extensive study on nine rank aggregation techniques across twenty-six bioinformatics datasets. Our results show that certain aggregation techniques are very similar to each other, while others are quite unique in that they are not similar to the other techniques. Additionally, it was found that as the size of the feature subset increases, the similarity between the techniques increases. To our knowledge this is the first study which examines this many rank aggregation techniques within the domain of bioinformatics.
  • Keywords
    bioinformatics; genetics; set theory; statistical analysis; bioinformatics datasets; feature ranking aggregation techniques; feature subsets; gene feature ordering; gene list ordering; stability improvement; stability measurement; stable list; univariate feature rankers; Bioinformatics; Data mining; Genomics; Robustness; Stability analysis; Thermal stability; bioinformatics; feature ranking; rank aggregation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Reuse and Integration (IRI), 2012 IEEE 13th International Conference on
  • Conference_Location
    Las Vegas, NV
  • Print_ISBN
    978-1-4673-2282-9
  • Electronic_ISBN
    978-1-4673-2283-6
  • Type

    conf

  • DOI
    10.1109/IRI.2012.6303034
  • Filename
    6303034