Title :
Stochastic simulations of rejected World Wide Web pages
Author :
Meghabghab, George
Author_Institution :
Dept. of Math. & Comput. Sci., Valdosta State Univ., GA, USA
Abstract :
Studies the use of neural networks in a stochastic simulation of the number of rejected Web pages per search query. The evaluation of the quality of search engines should involve not only the resulting set of Web pages but also an estimate of the rejected set of pages. The iterative radial basis function (RBF) neural network developed by G. Meghabghab and G. Nasr (1999) was adapted to an actual evaluation of the number of rejected Web pages on four search engines, viz. Yahoo, Alta Vista, Google and Northern Light. Nine input variables were selected for the simulation. Typical stochastic simulation meta-models use regression models in response surface methods. An RBF divides the resulting set of responses to a query into accepted and rejected Web pages. The RBF meta-model was trained on 937 examples from a set of 9,000 different simulation runs on nine input variables. The results show that the number of rejected Web pages for a specific set of search queries on these four engines is very high. Also, a goodness measure of a search engine for a given set of queries can be designed which is a function of the coverage of the search engine and the normalized age of a new document in the resulting set for the query. This study concludes that, unless search engine designers address the issues of rejected Web pages, indexing and crawling, then the usage of the Web as a research tool for academic and educational purposes will remain hindered
Keywords :
digital simulation; indexing; information resources; information retrieval system evaluation; radial basis function networks; relevance feedback; search engines; statistical analysis; stochastic systems; Alta Vista; Google; Northern Light; Web crawling; Yahoo; academic research tool; coverage; educational research tool; goodness measure; indexing; input variables; iterative radial basis function neural network; meta-models; normalized document age; regression models; rejected World Wide Web pages; response surface methods; search engine quality evaluation; search queries; stochastic simulation; Artificial neural networks; Computational modeling; Information retrieval; Input variables; Neural networks; Search engines; Stochastic processes; Web pages; Web sites; World Wide Web;
Conference_Titel :
Modeling, Analysis and Simulation of Computer and Telecommunication Systems, 2000. Proceedings. 8th International Symposium on
Conference_Location :
San Francisco, CA
Print_ISBN :
0-7695-0728-X
DOI :
10.1109/MASCOT.2000.876575