DocumentCode :
3582788
Title :
Multivariate analysis of Web content changes
Author :
Calzarossa, Maria Carla ; Tessera, Daniele
Author_Institution :
Dipt. di Ing. Ind. e Inf., Univ. di Pavia, Pavia, Italy
fYear :
2014
Firstpage :
699
Lastpage :
706
Abstract :
News websites are expected to deliver in a timely manner the latest stories as well as their latest developments. Thereby, tools, such as, search engines, need to cope with these rapid and frequent content changes by adjusting their crawling activities accordingly. In this paper we explore and model the properties and temporal behavior of the content changes of three major news websites. The dynamics of the changes is characterized by large fluctuations and significant differences from day to day and from hour to hour. However, a certain degree of similarity in the overall patterns of each website exists. In particular, the application of multivariate analysis techniques allows us to identify groups of days with similar change patterns, thus allowing for the customization of the crawling policies adopted by search engines.
Keywords :
Internet; Web sites; search engines; Web content changes; Web site; change patterns; crawling policies; multivariate analysis techniques; search engines; Analytical models; Correlation; Eigenvalues and eigenfunctions; Loading; Principal component analysis; Search engines; Web pages;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Systems and Applications (AICCSA), 2014 IEEE/ACS 11th International Conference on
Type :
conf
DOI :
10.1109/AICCSA.2014.7073268
Filename :
7073268
Link To Document :
بازگشت