DocumentCode :
2378478
Title :
TOPPER: An algorithm for mining top k patterns in biological sequences based on regularity measurement
Author :
Xiong, Yun ; He, Junhua ; Zhu, Yangyong
Author_Institution :
Res. Center for Dataology & Data Sci., Fudan Univ., Shanghai, China
fYear :
2010
fDate :
18-18 Dec. 2010
Firstpage :
283
Lastpage :
288
Abstract :
Biological sequential patterns usually exhibit some significant functions in a set of sequences. Mining such patterns offers a key means of insight into transcription regulation mechanisms and becomes a useful primitive task underlying many researches and applications. Recently, various methods have been developed to identify biological patterns. However, traditional approaches to mine sequential pattern will get a huge result set, which make biologists difficult to decide which patterns are interesting and meaningful. In this paper, we study a variant of biological sequential pattern mining aiming at the huge result set, termed top k representative patterns mining based on regularity measurement. As the first attempt to tackle the problem, a new measurement `regularity´ is defined to evaluate the interesting of each pattern and an efficient algorithm is proposed with pruning strategy which returns top k representative patterns ranked by the regularity. Experimental results demonstrate that the proposed method is more efficient than the state-of-the-art methods on the real datasets.
Keywords :
biology computing; data mining; pattern formation; TOPPER; biological sequential patterns; pruning; regularity measurement; top k pattern mining; transcription regulation; biological sequence; data mining; functional element; sequential pattern;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Bioinformatics and Biomedicine Workshops (BIBMW), 2010 IEEE International Conference on
Conference_Location :
Hong, Kong
Print_ISBN :
978-1-4244-8303-7
Electronic_ISBN :
978-1-4244-8304-4
Type :
conf
DOI :
10.1109/BIBMW.2010.5703813
Filename :
5703813
Link To Document :
بازگشت