• DocumentCode
    2734143
  • Title

    Approximate Keyword Search in Web Search Engines

  • Author

    Wu, Sun ; Chang, Hsien-Tsung ; Hsu, Ting-Chao ; Liu, Pei-Shin

  • Author_Institution
    Dept. of Comput. Sci. & Inf. Eng., Nat. Chung Cheng Univ., Min-Hsiung
  • fYear
    2006
  • fDate
    6-6 Dec. 2006
  • Firstpage
    404
  • Lastpage
    411
  • Abstract
    We present a new index method to provide approximate keyword search in search engines. Our approximate keyword matching adopts a new similarity measurement called Listance model, which is a variation of the LCS (longest common subsequence) model. Two keywords are considered approximately matched, if their Listance is no more than a predefined parameter k. Suppose the length of keywords A and B are m and n respectively, the Listance between A and B is defined to be max(m, n) - LCS(A, B). The index method uses a new data structure called LBS index (listance bounded subsequence index), which was designed to allow for very fast approximate keyword matching. In the index phase, a collection of keywords is used as a reference dictionary. We transform keywords in the Web pages into a special form to be indexed if they match one of the keywords approximately in the reference dictionary. During the query processing, a similar keyword transformation is conducted to search the approximate index. The experimental result shows that our approach is efficient and can provide approximate keyword search capability that could be practically interesting.
  • Keywords
    data structures; search engines; Web search engines; approximate keyword search; data structure; longest common subsequence model; query processing; similarity measurement; Computer science; Dictionaries; Keyword search; Los Angeles Council; Pattern matching; Query processing; Search engines; Sun; Web pages; Web search;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Digital Information Management, 2006 1st International Conference on
  • Conference_Location
    Bangalore
  • Print_ISBN
    1-4244-0682-X
  • Type

    conf

  • DOI
    10.1109/ICDIM.2007.369229
  • Filename
    4221921