Title :
Extracting new topic contents from hidden Web sites
Author :
Mouri, Takanori ; Kitagawa, Hiroyuki
Author_Institution :
Graduate Sch. of Syst. & Inf. Eng., Tsukuba Univ., Ibaraki, Japan
Abstract :
Many information sources provide their database contents through query interfaces. Hidden Web sites are typical examples. In most cases, their database contents change dynamically, with new documents on emerging topics being appended. In applications like topic detection and trend analysis, we want to discover newly emerging contents in the databases. It is very difficult, however, for ordinary users to detect them through only query interfaces, without support by database contents administrators. In this paper, we propose a novel method to automatically discover such contents. The proposed method generates biased query probes to be issued to a hidden Web site with a keyword-based query interface. The probes focus on extracting documents on newly emerging topics. We evaluate effectiveness with experiments.
Keywords :
Web sites; content management; content-based retrieval; data mining; automatic content discovery; biased query probes; change detection; database contents; document database; document extraction; emerging topic; hidden Web sites; information sources; keyword-based query interface; query interfaces; topic content extraction; topic detection; trend analysis; Cellular neural networks; Computer interfaces; Data analysis; Data engineering; Data mining; Databases; Information technology; Internet; Probes; Systems engineering and theory;
Conference_Titel :
Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004. International Conference on
Print_ISBN :
0-7695-2108-8
DOI :
10.1109/ITCC.2004.1286472