Title :
Multilingual Web Documents: the system Hyperling
Author :
Nguyen, Tuan-Dang ; Zreik, Khaldoun
Author_Institution :
GREYC, Caen Univ.
Abstract :
Hyperling is a formal, language independent, system dealing with hyperdocuments (Web sites). It observes that links structure and context embed crucial information for both hyperdocument retrieving and hyperdocument mining process. For this we suggest a clustering Hyperling that deals with multilingual hyperdocuments (Web sites). In order to determine the number and frontiers between the different used languages, we adopt a distributional approach to pre process the hyperdocument structure before clustering it. Our main hypothesis considers links related to the same language be regrouped together in a cluster. From this we can conclude that the more important generated clusters represent the dominant languages
Keywords :
Web sites; document handling; natural languages; Hyperling; Web sites; hyperdocuments; multilingual Web documents; Clustering algorithms; Data mining; Frequency; Information retrieval; Laboratories; Machine learning; Magnetohydrodynamics; Research and development; Statistics; Text analysis;
Conference_Titel :
Information and Communication Technologies, 2006. ICTTA '06. 2nd
Conference_Location :
Damascus
Print_ISBN :
0-7803-9521-2
DOI :
10.1109/ICTTA.2006.1684435