Abstract :
In this article, I investigate the reliability, in the social
science sense, of collecting informetric data about the
World Wide Web by Web crawling. The investigation
includes a critical examination of the practice of Web
crawling and contrasts the results of content crawling
with the results of link crawling. It is shown that Web
crawling by search engines is intentionally biased and
selective. I also report the results of a large-scale experimental
simulation of Web crawling that illustrates the
effects of different crawling policies on data collection. It
is concluded that the reliability of Web crawling as a data
collection technique is improved by fuller reporting of
relevant crawling policies