Title :
The Evolution of Link-Attributes for Pages and Its Implications on Web Crawling
Author :
Meng, Tao ; Yan, Hongfei ; Wang, Jimin ; Li, Xiaoming
Author_Institution :
Peking University, Beijing, China
Abstract :
It is important for an incremental crawler to know how web pages evolve and the relation between their changing frequencies and the link-attributes such as indegrees. This paper proposes a model for incremental crawling and performs an experiment to verify the correlation between them, by monitoring the evolution of all the link-attributes of the web pages within one website. Particularly, we look deeply into one special kind of page named Index-pages. From the experiment, we can make four conclusions: (1) Pages which have bigger indegrees, outdegrees or PageRank values change more often, and these link-attributes all approximately obey a power-law distribution. (2) The link-attributes of pages seldom change though the pages change themselves. (3) A small proportion of the pages link to most of the vertexes in the web graph. (4) The Index-pages link to sizeable new pages in a website. These conclusions can be used to greatly enhance the performance of an incremental crawler, which is the foremost component for general search engines and web information stores.
Keywords :
Computer networks; Computer science; Crawlers; Frequency; Internet; Laboratories; Monitoring; Search engines; Web pages; World Wide Web;
Conference_Titel :
Web Intelligence, 2004. WI 2004. Proceedings. IEEE/WIC/ACM International Conference on
Print_ISBN :
0-7695-2100-2
DOI :
10.1109/WI.2004.10097