DocumentCode
3393691
Title
Data collection for evaluating automatic filtering of hazardous WWW information
Author
Hoashi, Keiichiro ; Inoue, Naomi ; Hashimoto, Kazuo
Author_Institution
KDD R&D Labs. Inc., Saitama, Japan
fYear
1999
fDate
1999
Firstpage
157
Lastpage
164
Abstract
We describe our data collection constructed for the evaluation of automatic filtering of hazardous WWW information. Currently, there are three types of filtering systems: self rating, individual rating and automatic filtering. We propose an ideal system architecture for effective filtering based on the analysis of existing systems. For the development of our filtering system, we have collected a massive amount of hazardous WWW data. We presumed that WWW pages with few words are difficult to filter automatically, but analysis on our data collection has proved that effective automatic filtering can be achieved by applying the hierarchy of HTML data. We have also practically proved this hypothesis by evaluation experiments using an experimental automatic filtering algorithm
Keywords
Internet; hypermedia markup languages; information resources; information retrieval; HTML; Internet; Web pages; World Wide Web; automatic information filtering; data collection; hazardous Web information; individual rating; self rating; Data analysis; Filtering algorithms; HTML; Information filtering; Information filters; Internet; Laboratories; Research and development; Web pages; World Wide Web;
fLanguage
English
Publisher
ieee
Conference_Titel
Internet Workshop, 1999. IWS 99
Conference_Location
Osaka
Print_ISBN
0-7803-5925-9
Type
conf
DOI
10.1109/IWS.1999.811008
Filename
811008
Link To Document