Title :
Experimental results on the alignment of multilingual Web sites
Author :
Ricca, Filippo ; Tonella, Paolo ; Pianta, Emanuele ; Girardi, Christian
Author_Institution :
ITC, Povo, Italy
Abstract :
Institutions and companies that are based in countries where the main language is not English typically publish Web sites that offer the same information at least in the local language and in English. However, the evolution of these Web sites may be troublesome, if the same pages are replicated for all supported languages. In fact, changes have to be propagated to all translations of a modified page. Algorithms that help ensure the consistency of multilingual Web pages exploit natural language processing (NLP) methods for the comparison of the content in the pages to be aligned. Since such methods are quite expensive from the point of view of the involved linguistic resources as well as of the computation time, a trade off should be considered between the benefits of more advanced techniques and the costs of their implementation. In this paper, an empirical evaluation is conducted to establish the proper NLP methods, combined with structural comparison methods, to use in Web page alignment.
Keywords :
Web sites; content management; natural languages; Web page alignment; linguistic resources; multilingual Web sites; natural language processing; Content based retrieval; Content management; Costs; Databases; Information retrieval; Natural language processing; Natural languages; Portals; Web pages; XML;
Conference_Titel :
Software Maintenance and Reengineering, 2004. CSMR 2004. Proceedings. Eighth European Conference on
Print_ISBN :
0-7695-2107-X
DOI :
10.1109/CSMR.2004.1281431