• DocumentCode
    375644
  • Title

    Applying the branch and bound technique to document similarity search

  • Author

    Furuse, Kazutaka ; Miura, Takayuki ; Ishikawa, Masahiro ; Chen, Hanxion ; Ohbo, Nobuo

  • Author_Institution
    Inst. of Inf. Sci. & Electron., Univ. of Tsukuba, Japan
  • Volume
    1
  • fYear
    2001
  • fDate
    2001
  • Firstpage
    331
  • Abstract
    This paper proposes a new mechanism for document similarity search, which uses the indexing structure called signature tables. The mechanism of signature tables is originally invented for similarity search of market basket data, and in this paper we apply it to document data. Since the characteristics of document data is definitely different from that of market basket data, the performance of similarity search is not satisfactory when the mechanism is naively applied to document data. In this paper, we describe the reason why the naive application decreases the efficiency, and propose some techniques for improving the performance. The results of simulation using real document data set show that the proposed mechanism implements good performance
  • Keywords
    text analysis; tree searching; document similarity search; indexing structure; market basket data; signature tables; similarity search; Consumer electronics; Data mining; Indexing; Information science; Internet; Transaction databases; Web sites;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Communications, Computers and signal Processing, 2001. PACRIM. 2001 IEEE Pacific Rim Conference on
  • Conference_Location
    Victoria, BC
  • Print_ISBN
    0-7803-7080-5
  • Type

    conf

  • DOI
    10.1109/PACRIM.2001.953590
  • Filename
    953590