DocumentCode :
2761482
Title :
Application of ensemble models in web ranking
Author :
Hashemi, Homa Baradaran ; Yazdani, Nasser ; Shakery, Azadeh ; Naeini, Mahdi Pakdaman
Author_Institution :
Sch. of Electr. & Comput. Eng., Univ. of Tehran, Tehran, Iran
fYear :
2010
fDate :
4-6 Dec. 2010
Firstpage :
726
Lastpage :
731
Abstract :
One of the most important parts of search engines is the ranking unit. Many different classical ranking algorithms based on content (such as TF-IDF and BM25) and connectivity (such as HITS and PageRank) have been used in web search engines to find pages in response to a user query. Although these algorithms have been developed to improve retrieval results, none of them can take advantage of power of contents as well as useful link structures. Thus, it remains a challenging research question how to effectively combine these available information to maximize search accuracy. In this study, we investigate the application of different ensemble models in ranking algorithms. Some of them are simple such as Sum, Product and Borda rule, and the others are more complicated methods. We present three complex ensemble approaches. The first one is OWA operator to merge the results of various ranking algorithms. In the second approach, a state-of-the-art method, simulated click-through data, is used to learn how to combine many content and connectivity features of web pages. Moreover, we present a modified version of SVM classifier customized for ranking problems as the third complex fusion approach. The proposed methods are evaluated using the LETOR and dotIR benchmark data sets. The experimental results show that in most of the cases ensemble methods give better results and the improvements are very encouraging. These results also show that the OWA and SVM fusion methods are promising respect to other ensemble models.
Keywords :
pattern classification; search engines; sensor fusion; support vector machines; BM25 ranking algorithm; Borda rule ensemble model; HITS ranking algorithm; PageRank ranking algorithm; Product ensemble model; SVM classifier; SVM fusion methods; Sum ensemble model; TF-IDF ranking algorithm; Web pages; Web ranking; ordered weighted avergaing operator; search engines; simulated click-through data approach; support vector machine; user query; Accuracy; Benchmark testing; Data models; Feature extraction; Open wireless architecture; Support vector machines; Web pages; Document Feature Combination; Ordered Weighted Averaging; Simulated Click-through Data; Support Vector Machine; Web Ranking;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Telecommunications (IST), 2010 5th International Symposium on
Conference_Location :
Tehran
Print_ISBN :
978-1-4244-8183-5
Type :
conf
DOI :
10.1109/ISTEL.2010.5734118
Filename :
5734118
Link To Document :
بازگشت