• DocumentCode
    1648259
  • Title

    Needles and Haystacks: a search engine for personal information collections

  • Author

    De Kretser, Owen ; Moffat, Alistair

  • Author_Institution
    Dept. of Comput. Sci. & Software Eng., Melbourne Univ., Vic., Australia
  • fYear
    2000
  • fDate
    6/22/1905 12:00:00 AM
  • Firstpage
    58
  • Lastpage
    65
  • Abstract
    Information retrieval systems can be partitioned into two main classes: large-scale systems that make use of an inverted index or some other auxiliary data structure, intended for massive volumes of data; and the small-scale systems based upon sequential pattern matching that most computer users employ when hunting for missing email and news items. In this paper we describe a hybrid approach that offers the ranked queries and similarity matching of a genuine information retrieval system, but does so without any need for an index to be precomputed. This software tool, which we call seft, offers performance that in a retrieval effectiveness sense matches conventional information retrieval systems, and in a resource efficiency sense, while considerably slower than grep-like tools, is fast enough to be useful on hundreds of megabytes of text
  • Keywords
    pattern matching; search engines; Needles and Haystacks; auxiliary data structure; email; grep-like tools; hybrid approach; information retrieval systems; inverted index; large-scale systems; news items; personal information collections; ranked queries; search engine; seft; sequential pattern matching; similarity matching; small-scale systems; software tool; Computer science; Data structures; Frequency; Information retrieval; Large-scale systems; Needles; Pattern matching; Search engines; Software engineering; Tellurium;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science Conference, 2000. ACSC 2000. 23rd Australasian
  • Conference_Location
    Canberra, ACT
  • Print_ISBN
    0-7695-0518-X
  • Type

    conf

  • DOI
    10.1109/ACSC.2000.824381
  • Filename
    824381