Title :
Applying the branch and bound technique to document similarity search
Author :
Furuse, Kazutaka ; Miura, Takayuki ; Ishikawa, Masahiro ; Chen, Hanxion ; Ohbo, Nobuo
Author_Institution :
Inst. of Inf. Sci. & Electron., Univ. of Tsukuba, Japan
Abstract :
This paper proposes a new mechanism for document similarity search, which uses the indexing structure called signature tables. The mechanism of signature tables is originally invented for similarity search of market basket data, and in this paper we apply it to document data. Since the characteristics of document data is definitely different from that of market basket data, the performance of similarity search is not satisfactory when the mechanism is naively applied to document data. In this paper, we describe the reason why the naive application decreases the efficiency, and propose some techniques for improving the performance. The results of simulation using real document data set show that the proposed mechanism implements good performance
Keywords :
text analysis; tree searching; document similarity search; indexing structure; market basket data; signature tables; similarity search; Consumer electronics; Data mining; Indexing; Information science; Internet; Transaction databases; Web sites;
Conference_Titel :
Communications, Computers and signal Processing, 2001. PACRIM. 2001 IEEE Pacific Rim Conference on
Conference_Location :
Victoria, BC
Print_ISBN :
0-7803-7080-5
DOI :
10.1109/PACRIM.2001.953590