• DocumentCode
    3398548
  • Title

    The bit vector intersection problem

  • Author

    Karp, Richard M. ; Waarts, Orli ; Zweig, Geoffrey

  • Author_Institution
    California Univ., Berkeley, CA, USA
  • fYear
    1995
  • fDate
    23-25 Oct 1995
  • Firstpage
    621
  • Lastpage
    630
  • Abstract
    This paper introduces the bit vector intersection problem: given a large collection of sparse bit vectors, find all the pairs with at least t ones in common for a given input parameter t. The assumption is that the number of ones common to any two vectors is significantly less than t, except for an unknown set of O(n) pairs. This problem has important applications in DNA physical mapping, clustering, and searching for approximate dictionary matches. We present two randomized algorithms that solve this problem with high probability and in sub-quadratic expected time. One of these algorithms is based on a recursive tree-searching procedure, and the other on hashing. We analyze the tree scheme in terms of branching processes, while our analysis of the hashing scheme is based on Markov chains. Since both algorithms have similar asymptotic performance, we also examine experimentally their relative merits in practical situations. We conclude by showing that a fundamental problem arising in the Human Genome Project is captured by the bit vector intersection problem described above and hence can be solved by our algorithms
  • Keywords
    algorithm theory; randomised algorithms; tree data structures; bit vector intersection; dictionary matches; hashing; high probability; randomized algorithms; recursive tree-searching; sparse bit vectors; Bioinformatics; Biological cells; Cloning; Clustering algorithms; DNA; Dictionaries; Electronic mail; Fingerprint recognition; Genomics; Humans;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Foundations of Computer Science, 1995. Proceedings., 36th Annual Symposium on
  • Conference_Location
    Milwaukee, WI
  • ISSN
    0272-5428
  • Print_ISBN
    0-8186-7183-1
  • Type

    conf

  • DOI
    10.1109/SFCS.1995.492663
  • Filename
    492663