• DocumentCode
    2288405
  • Title

    WIRE-a WWW-based information retrieval and extraction system

  • Author

    Aggarwal, Sudhir ; Hung, Fuyung ; Meng, Weiyi

  • Author_Institution
    Dept. of Comput. Sci., State Univ. of New York, Binghamton, NY, USA
  • fYear
    1998
  • fDate
    25-28 Aug 1998
  • Firstpage
    887
  • Lastpage
    892
  • Abstract
    Locating and retrieving specific data from the World Wide Web (WWW) is an important problem. Existing search engines often return too much useless data and are generally incapable of automatically extracting specific information such as names and email addresses. We describe WIRE, a WWW-based information retrieval and extraction system whose goal is to accurately retrieve and organize specific information from the World Wide Web. WIRE employs several innovative techniques. First, queries of WIRE are tree structured. This not only provides an order in which Web pages are to be searched/retrieved but also provides a context for more accurate retrieval. Second, WIRE employs a library of search templates based on the structure of HTML files to extract specific information. These templates can be complemented by user-provided search examples and patterns for better results. Third, WIRE has a filter mechanism to filter our undesired information to further improve retrieval accuracy
  • Keywords
    Internet; hypermedia; information retrieval; online front-ends; tree data structures; HTML files; WIRE; Web pages; World Wide Web; email; information extraction system; information filtering; information retrieval system; search engines; search templates; tree structured queries; user-provided search examples; Data mining; Information filtering; Information filters; Information retrieval; Libraries; Search engines; Web pages; Web sites; Wire; World Wide Web;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Database and Expert Systems Applications, 1998. Proceedings. Ninth International Workshop on
  • Conference_Location
    Vienna
  • Print_ISBN
    0-8186-8353-8
  • Type

    conf

  • DOI
    10.1109/DEXA.1998.707510
  • Filename
    707510