DocumentCode :
3125895
Title :
Finding Novel Diagnostic Gene Patterns Based on Interesting Non-redundant Contrast Sequence Rules
Author :
Zhao, Yuhai ; Wang, Guoren ; Li, Yuan ; Wang, Zhanghui
Author_Institution :
Dept. of Inf. Sci. & Eng., Northeastern Univ., Shengyang, China
fYear :
2011
fDate :
11-14 Dec. 2011
Firstpage :
972
Lastpage :
981
Abstract :
Diagnostic genes refer to the genes closely related to a specific disease phenotype, the powers of which to distinguish between different classes are often high. Most methods to discovering the powerful diagnostic genes are either singleton discriminability-based or combination discriminability-based. However, both ignore the abundant interactions among genes, which widely exist in the real world. In this paper, we tackle the problem from a new point of view and make the following contributions: (1) we propose an EWave model, which profitably exploits the ordered expressions among genes based on the defined equivalent dimension group sequences taking into account the "noise" universal in the real data, (2) we devise a novel sequence rule, namely interesting non-redundant contrast sequence rule, which is able to capture the difference between different phenotypes in a high accuracy using as few as possible genes, (3) we present an efficient algorithm called NRMINER to find such rules. Unlike the conventional column enumeration and the more recent row enumeration, it performs a novel template-driven enumeration by making use of the special characteristic of micro array data modeled by EWave. Extensive experiments conducted on various synthetic and real datasets show that: (1) NRMINER is significantly faster than the competing algorithm by up to about one order of magnitude, (2) it provides a higher accuracy using fewer genes. Many diagnostic genes discovered by NRMINER are proved biologically related to some disease.
Keywords :
biology; pattern recognition; EWave model; NRMINER; column enumeration; disease phenotype; finding novel diagnostic gene patterns; interesting nonredundant contrast sequence rules; micro array data; powerful diagnostic genes; Accuracy; Data models; Diseases; Gene expression; Generators; Noise; data mining; diagnostic gene; sequence rule;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining (ICDM), 2011 IEEE 11th International Conference on
Conference_Location :
Vancouver,BC
ISSN :
1550-4786
Print_ISBN :
978-1-4577-2075-8
Type :
conf
DOI :
10.1109/ICDM.2011.68
Filename :
6137302
Link To Document :
بازگشت