Title :
String join using precedence count matrix
Author :
Cao, Xia ; Tung, Anthony K H ; Ooi, Beng Chin ; Tan, Kian-Lee ; Li, Shuai Cheng
Author_Institution :
Dept. of Comput. Sci., National Univ. of Singapore, Singapore
Abstract :
In this paper; we propose a filter-and-refine string join algorithm. While the filtering phase can rapidly prune away strings that are not joinable, the refinement phase employs a comprehensive algorithm to remove the remaining false alarms. The efficiency of the proposed scheme lies in the use of the precedence count matrix (PCM) for computing the edit distance between two sequences. With PCM, the complexity of sequence comparison is a constant time. We also evaluated the proposed sequence join algorithm, and our study shows that it outperforms the known techniques.
Keywords :
DNA; distributed databases; genetics; query languages; relational databases; scientific information systems; string matching; DNA sequences; constant time complexity; false alarm removal; filter-and-refine string join algorithm; genomic applications; precedence count matrix; sequence comparison; sequence edit distance computing; sequence join algorithm; string data manipulation; string pruning; string refinement; string similarity; Assembly; Bioinformatics; Computer science; Dynamic programming; Filtering algorithms; Filters; Finance; Genomics; Phase change materials;
Conference_Titel :
Scientific and Statistical Database Management, 2004. Proceedings. 16th International Conference on
Print_ISBN :
0-7695-2146-0
DOI :
10.1109/SSDM.2004.1311228