Author_Institution :
Dept. of Comput. Sci. & Eng., Waseda Univ., Tokyo, Japan
Abstract :
We propose intensity constraint-based closed sequential pattern mining algorithm, called IC-BIDE, for a coding pattern extraction. Source code often contains frequent patterns of function calls or control flows, i.e., "coding patterns." Previous studies used sequential pattern mining to extract coding pattern, however, these algorithms have not been optimized for coding pattern extraction, which results in useless patterns as well as long execution times. We propose a new constraint, called "intensity constraint," in order to enhance closed sequential pattern mining and efficiently extract coding patterns. Our proposed algorithm is based on BI-Directional Execution (BIDE), an algorithm proposed expressly for closed sequential pattern mining. BIDE algorithm is not able to adapt to constraint-based closed sequential pattern mining. We extend BIDE algorithm and prove that our extended algorithm is able to adapt to intensity constraint-based closed sequential pattern mining. Our contributions are as follow, 1) We propose a new constraint, which we call "intensity", 2) We propose intensity constraint-based closed sequential pattern mining algorithm, which we call "IC-BIDE" algorithm. Experimental results with open source software (Bullet Physics, MySQL, and OpenCV) show that IC-BIDE algorithm successfully excludes useless pattern effectively. Moreover, our proposed method is able to accelerate the extraction by a factor of 8.9 in comparison with the BIDE algorithm.
Keywords :
data mining; encoding; public domain software; Bullet Physics; IC-BIDE; MySQL; OpenCV; bidirectional execution; coding pattern extraction; intensity constraint-based closed sequential pattern mining; open source software; source code; Algorithm design and analysis; Bidirectional control; Data mining; Databases; Encoding; Software; Software algorithms; closed sequential pattern mining; coding pattern extraction; constraint-based pattern mining;