• DocumentCode
    3393691
  • Title

    Data collection for evaluating automatic filtering of hazardous WWW information

  • Author

    Hoashi, Keiichiro ; Inoue, Naomi ; Hashimoto, Kazuo

  • Author_Institution
    KDD R&D Labs. Inc., Saitama, Japan
  • fYear
    1999
  • fDate
    1999
  • Firstpage
    157
  • Lastpage
    164
  • Abstract
    We describe our data collection constructed for the evaluation of automatic filtering of hazardous WWW information. Currently, there are three types of filtering systems: self rating, individual rating and automatic filtering. We propose an ideal system architecture for effective filtering based on the analysis of existing systems. For the development of our filtering system, we have collected a massive amount of hazardous WWW data. We presumed that WWW pages with few words are difficult to filter automatically, but analysis on our data collection has proved that effective automatic filtering can be achieved by applying the hierarchy of HTML data. We have also practically proved this hypothesis by evaluation experiments using an experimental automatic filtering algorithm
  • Keywords
    Internet; hypermedia markup languages; information resources; information retrieval; HTML; Internet; Web pages; World Wide Web; automatic information filtering; data collection; hazardous Web information; individual rating; self rating; Data analysis; Filtering algorithms; HTML; Information filtering; Information filters; Internet; Laboratories; Research and development; Web pages; World Wide Web;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Internet Workshop, 1999. IWS 99
  • Conference_Location
    Osaka
  • Print_ISBN
    0-7803-5925-9
  • Type

    conf

  • DOI
    10.1109/IWS.1999.811008
  • Filename
    811008