Title :
Generating closed frequent gensets under constraints based on FP-Tree structure
Author :
Trabelsi, Chiraz ; Latiri, C. ; Ghedira, Khaled
Author_Institution :
Res. group SOIE, Tunisian High Sch. of Manage., Tunis
Abstract :
The mechanism of gene regulation is of great interest for biologists, especially in the genomic field. One part of mechanisms controlling the genes expression is provided by the transcription factors, which are proteins that can either repress or stimulate the transcription of a gene. In this paper, we propose a new data mining algorithm, based on Boolean contexts, in order to extract a priori relevant frequent closed gensets, i.e., sets of tissues and associated sets of genes and transcription factors which are useful for the biologist. The key feature of our algorithm is a better compromise between the size of the search space and the conveyed discovered knowledge in bioinformatics. For this, the proposed algorithm, called MC 2G for mining constraint closed gensets, uses the frequent pattern tree (FP-Tree) structure, which is an extended prefix-tree structure, to prune the search space. Moreover MC2G enables to define statistical and syntaxic constraints on the desired frequent closed gensets and uses them during the extraction process. Experimental comparisons with other algorithms are achieved on real world datasets
Keywords :
biology computing; data mining; genetics; proteins; statistics; tree searching; Boolean contexts; bioinformatics; closed frequent gensets; constraint closed genset mining; data mining; extended prefix-tree structure; frequent pattern-tree structure; gene regulation; pattern discovery; proteins; search space; statistical constraints; syntax constraints; Association rules; Bioinformatics; Biological information theory; Biology computing; Data mining; Drugs; Frequency; Genetic expression; Proteins; Systems engineering and theory; Closed frequent genset; Constraint-based data mining; FP-Tree structure; Gene expression; Pattern discovery; Transcription factor;
Conference_Titel :
Computational Engineering in Systems Applications, IMACS Multiconference on
Conference_Location :
Beijing
Print_ISBN :
7-302-13922-9
Electronic_ISBN :
7-900718-14-1
DOI :
10.1109/CESA.2006.313557