DocumentCode :
2117181
Title :
On the Uniform Sampling of the Web: An Improvement on Bucket Based Sampling
Author :
Heidari, Sanaz ; Mousavi, Hamid ; Movaghar, Ali
Author_Institution :
CE Dept., Qazvin Univ. of Tech., Tehran
fYear :
2009
fDate :
27-28 Feb. 2009
Firstpage :
205
Lastpage :
209
Abstract :
Web is one of the biggest sources of information. The tremendous size, the dynamicity, and the structure of the Web have made the information retrieval process of the Web a challenging issue. Web search engines (WSEs) have started to help users with this matter. However, these types of application, to perform more effectively, always need current information about many characteristics of the Web. To determine these characteristics, one way is to use statistical sampling of the Web pages. In this kind of approaches, instead of analyzing a large number of Web pages, a rather smaller and more uniform set of Web pages is used. This research attempts to analyze the presented methods for generating uniform samples of the pages from the World Wide Web. It specifically focuses on a new method called BBS. Briefly, we improved BBS at least by 4.45% regarding the uniformity of the samples. Using this improved BBS, we estimated the size of the public indexable Web at 27.4 Billion pages. The index sizes of some commercial WSEs are also estimated and compared.
Keywords :
Internet; information retrieval; search engines; Web pages; Web search engines; World Wide Web; bucket based sampling; information retrieval process; uniform sampling; Content based retrieval; Equal opportunities; Information resources; Information retrieval; Sampling methods; Search engines; Testing; Web pages; Web search; Web sites; Uniform Sampling.; Web; Web Search Engine;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Communication Software and Networks, 2009. ICCSN '09. International Conference on
Conference_Location :
Macau
Print_ISBN :
978-0-7695-3522-7
Type :
conf
DOI :
10.1109/ICCSN.2009.164
Filename :
5076840
Link To Document :
بازگشت