Title :
TOPPER: An algorithm for mining top k patterns in biological sequences based on regularity measurement
Author :
Xiong, Yun ; He, Junhua ; Zhu, Yangyong
Author_Institution :
Res. Center for Dataology & Data Sci., Fudan Univ., Shanghai, China
Abstract :
Biological sequential patterns usually exhibit some significant functions in a set of sequences. Mining such patterns offers a key means of insight into transcription regulation mechanisms and becomes a useful primitive task underlying many researches and applications. Recently, various methods have been developed to identify biological patterns. However, traditional approaches to mine sequential pattern will get a huge result set, which make biologists difficult to decide which patterns are interesting and meaningful. In this paper, we study a variant of biological sequential pattern mining aiming at the huge result set, termed top k representative patterns mining based on regularity measurement. As the first attempt to tackle the problem, a new measurement `regularity´ is defined to evaluate the interesting of each pattern and an efficient algorithm is proposed with pruning strategy which returns top k representative patterns ranked by the regularity. Experimental results demonstrate that the proposed method is more efficient than the state-of-the-art methods on the real datasets.
Keywords :
biology computing; data mining; pattern formation; TOPPER; biological sequential patterns; pruning; regularity measurement; top k pattern mining; transcription regulation; biological sequence; data mining; functional element; sequential pattern;
Conference_Titel :
Bioinformatics and Biomedicine Workshops (BIBMW), 2010 IEEE International Conference on
Conference_Location :
Hong, Kong
Print_ISBN :
978-1-4244-8303-7
Electronic_ISBN :
978-1-4244-8304-4
DOI :
10.1109/BIBMW.2010.5703813