Title :
Frequent Approximate Substring Pattern Mining Using Locally Optimal Occurrence Counting
Author :
Nakamura, Atsuyoshi ; Tosaka, Hisashi ; Kudo, Mineichi
Author_Institution :
Hokkaido Univ., Sapporo, Japan
Abstract :
We propose a novel frequent approximate pattern mining in which occurrences themselves are valuable regions to extract. Given a string s, our mining task is to enumerate its sub strings that locally optimally match many sub strings of s. We show an algorithm for this problem whose time and space complexities are O(n3) for a string with length n. According to our experiments using synthetic data, our algorithm can correctly extract embedded frequent patterns as both patterns and their occurrences even if the embedded patterns are corrupted by random edit operations.
Keywords :
computational complexity; data mining; embedded frequent pattern extraction; frequent approximate substring pattern mining; locally optimal occurrence counting; mining task; random edit operation; space complexity; synthetic data; time complexity; Approximation algorithms; Complexity theory; Data mining; Electronic mail; Hamming distance; Optimized production technology; Pattern matching; data mining; frequent pattern; local alignment; string algorithm;
Conference_Titel :
Advanced Applied Informatics (IIAIAAI), 2012 IIAI International Conference on
Conference_Location :
Fukuoka
Print_ISBN :
978-1-4673-2719-0
DOI :
10.1109/IIAI-AAI.2012.20