Title :
TARENTe: an experimental tool for extracting and exploring Web aggregates
Author :
Ghitalla, Franck ; Diemert, Eustache ; Maussang, Camille ; Pfaender, Fabien
Author_Institution :
Univ. de Technol. de Compiegne, France
Abstract :
We discribe how to extract and visually explore the topology of an open, large scale, hypertext system such as the Web? We address this issue by developing an experimental tool for extracting, exploring and analyzing Aggregates of Web documents. This tool, called TARFNTe, includes a crawling technology, and algorithms for both content analysis and authority graphs calculations (as Kleinberg´s HITS), linked with visualization solutions. We provide series of experimental results on different topics that allow us to describe the web´s structure in terms of topic Aggregates. The TARENTe system was designed to provide multiple services including Web crawling, network analysis, data mining and information visualization tools. For these purposes we chose to build it using an ad hoc modular Java framework, which allows the integration of open-source code for each task. For simplicity concerns we organized the gathering/analyzing information process around a mySQL database, which can be addressed by different crawlers, as well as by multiple infoviz tools and analysis plug-ins.
Keywords :
Internet; Java; SQL; content-based retrieval; data visualisation; hypermedia; information retrieval; TARENTe; Web aggregate extraction; Web crawling; Web document; ad hoc modular Java framework; authority graphs calculation; content analysis; data mining; experimental tool; hypertext system; information visualization tool; infoviz tool; mySQL database; network analysis; open-source code integration; plug-ins analysis; Aggregates; Algorithm design and analysis; Data analysis; Data mining; Data visualization; Hypertext systems; Information analysis; Java; Large-scale systems; Topology;
Conference_Titel :
Information and Communication Technologies: From Theory to Applications, 2004. Proceedings. 2004 International Conference on
Print_ISBN :
0-7803-8482-2
DOI :
10.1109/ICTTA.2004.1307921