DocumentCode :
3004676
Title :
Characterizing the Web Using a New Uniform Sampling Approach
Author :
Mousavi, Hojjat ; Rafiei, Mohammad E. ; Movaghar, A.
Author_Institution :
Dept. of Comput. Eng., Sharif Univ. of Tech., Tehran, Iran
fYear :
2007
fDate :
7-12 Jan. 2007
Firstpage :
1
Lastpage :
5
Abstract :
Web is one the biggest source of information for many. It is also increasingly growing. For easier use of the Web, Web search engines (WSEs) are being used frequently. However, there is little information about the characteristics of the Web and also WSEs. One usual way to analysis these characteristics is to use a uniform sample. In such approaches, instead of working on the entire Web we can work on a small subset of the Web representing entire Web. In this paper, we propose a new method, called bucket-based sampling (BBS), to gather this small but uniform subset of the Web. The analyses show that BBS improves the samples´ uniformity, at least 6.95% respecting PAGERANK-SMP, one of the best existing methods. Using samples gathered by BBS, we compare the relative size of seven famous WSEs. We also estimate some important characteristics of the Web. For example we estimate that the size of indexable Web is around 20.14 billion pages.
Keywords :
Internet; estimation theory; sampling methods; search engines; BBS method; Web characterization; Web search engines; bucket-based sampling method; uniform sampling approach; Information resources; Sampling methods; Search engines; Statistics; Web pages; Web search; Web sites; Uniform Sampling; Web; Web Search Engine;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Communication Systems Software and Middleware, 2007. COMSWARE 2007. 2nd International Conference on
Conference_Location :
Bangalore
Print_ISBN :
1-4244-0613-7
Type :
conf
DOI :
10.1109/COMSWA.2007.382558
Filename :
4267982
Link To Document :
بازگشت