• DocumentCode
    3125895
  • Title

    Finding Novel Diagnostic Gene Patterns Based on Interesting Non-redundant Contrast Sequence Rules

  • Author

    Zhao, Yuhai ; Wang, Guoren ; Li, Yuan ; Wang, Zhanghui

  • Author_Institution
    Dept. of Inf. Sci. & Eng., Northeastern Univ., Shengyang, China
  • fYear
    2011
  • fDate
    11-14 Dec. 2011
  • Firstpage
    972
  • Lastpage
    981
  • Abstract
    Diagnostic genes refer to the genes closely related to a specific disease phenotype, the powers of which to distinguish between different classes are often high. Most methods to discovering the powerful diagnostic genes are either singleton discriminability-based or combination discriminability-based. However, both ignore the abundant interactions among genes, which widely exist in the real world. In this paper, we tackle the problem from a new point of view and make the following contributions: (1) we propose an EWave model, which profitably exploits the ordered expressions among genes based on the defined equivalent dimension group sequences taking into account the "noise" universal in the real data, (2) we devise a novel sequence rule, namely interesting non-redundant contrast sequence rule, which is able to capture the difference between different phenotypes in a high accuracy using as few as possible genes, (3) we present an efficient algorithm called NRMINER to find such rules. Unlike the conventional column enumeration and the more recent row enumeration, it performs a novel template-driven enumeration by making use of the special characteristic of micro array data modeled by EWave. Extensive experiments conducted on various synthetic and real datasets show that: (1) NRMINER is significantly faster than the competing algorithm by up to about one order of magnitude, (2) it provides a higher accuracy using fewer genes. Many diagnostic genes discovered by NRMINER are proved biologically related to some disease.
  • Keywords
    biology; pattern recognition; EWave model; NRMINER; column enumeration; disease phenotype; finding novel diagnostic gene patterns; interesting nonredundant contrast sequence rules; micro array data; powerful diagnostic genes; Accuracy; Data models; Diseases; Gene expression; Generators; Noise; data mining; diagnostic gene; sequence rule;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining (ICDM), 2011 IEEE 11th International Conference on
  • Conference_Location
    Vancouver,BC
  • ISSN
    1550-4786
  • Print_ISBN
    978-1-4577-2075-8
  • Type

    conf

  • DOI
    10.1109/ICDM.2011.68
  • Filename
    6137302