DocumentCode :
3103260
Title :
TARENTe: an experimental tool for extracting and exploring Web aggregates
Author :
Ghitalla, Franck ; Diemert, Eustache ; Maussang, Camille ; Pfaender, Fabien
Author_Institution :
Univ. de Technol. de Compiegne, France
fYear :
2004
fDate :
19-23 April 2004
Firstpage :
627
Lastpage :
628
Abstract :
We discribe how to extract and visually explore the topology of an open, large scale, hypertext system such as the Web? We address this issue by developing an experimental tool for extracting, exploring and analyzing Aggregates of Web documents. This tool, called TARFNTe, includes a crawling technology, and algorithms for both content analysis and authority graphs calculations (as Kleinberg´s HITS), linked with visualization solutions. We provide series of experimental results on different topics that allow us to describe the web´s structure in terms of topic Aggregates. The TARENTe system was designed to provide multiple services including Web crawling, network analysis, data mining and information visualization tools. For these purposes we chose to build it using an ad hoc modular Java framework, which allows the integration of open-source code for each task. For simplicity concerns we organized the gathering/analyzing information process around a mySQL database, which can be addressed by different crawlers, as well as by multiple infoviz tools and analysis plug-ins.
Keywords :
Internet; Java; SQL; content-based retrieval; data visualisation; hypermedia; information retrieval; TARENTe; Web aggregate extraction; Web crawling; Web document; ad hoc modular Java framework; authority graphs calculation; content analysis; data mining; experimental tool; hypertext system; information visualization tool; infoviz tool; mySQL database; network analysis; open-source code integration; plug-ins analysis; Aggregates; Algorithm design and analysis; Data analysis; Data mining; Data visualization; Hypertext systems; Information analysis; Java; Large-scale systems; Topology;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information and Communication Technologies: From Theory to Applications, 2004. Proceedings. 2004 International Conference on
Print_ISBN :
0-7803-8482-2
Type :
conf
DOI :
10.1109/ICTTA.2004.1307921
Filename :
1307921
Link To Document :
بازگشت