DocumentCode
2761482
Title
Application of ensemble models in web ranking
Author
Hashemi, Homa Baradaran ; Yazdani, Nasser ; Shakery, Azadeh ; Naeini, Mahdi Pakdaman
Author_Institution
Sch. of Electr. & Comput. Eng., Univ. of Tehran, Tehran, Iran
fYear
2010
fDate
4-6 Dec. 2010
Firstpage
726
Lastpage
731
Abstract
One of the most important parts of search engines is the ranking unit. Many different classical ranking algorithms based on content (such as TF-IDF and BM25) and connectivity (such as HITS and PageRank) have been used in web search engines to find pages in response to a user query. Although these algorithms have been developed to improve retrieval results, none of them can take advantage of power of contents as well as useful link structures. Thus, it remains a challenging research question how to effectively combine these available information to maximize search accuracy. In this study, we investigate the application of different ensemble models in ranking algorithms. Some of them are simple such as Sum, Product and Borda rule, and the others are more complicated methods. We present three complex ensemble approaches. The first one is OWA operator to merge the results of various ranking algorithms. In the second approach, a state-of-the-art method, simulated click-through data, is used to learn how to combine many content and connectivity features of web pages. Moreover, we present a modified version of SVM classifier customized for ranking problems as the third complex fusion approach. The proposed methods are evaluated using the LETOR and dotIR benchmark data sets. The experimental results show that in most of the cases ensemble methods give better results and the improvements are very encouraging. These results also show that the OWA and SVM fusion methods are promising respect to other ensemble models.
Keywords
pattern classification; search engines; sensor fusion; support vector machines; BM25 ranking algorithm; Borda rule ensemble model; HITS ranking algorithm; PageRank ranking algorithm; Product ensemble model; SVM classifier; SVM fusion methods; Sum ensemble model; TF-IDF ranking algorithm; Web pages; Web ranking; ordered weighted avergaing operator; search engines; simulated click-through data approach; support vector machine; user query; Accuracy; Benchmark testing; Data models; Feature extraction; Open wireless architecture; Support vector machines; Web pages; Document Feature Combination; Ordered Weighted Averaging; Simulated Click-through Data; Support Vector Machine; Web Ranking;
fLanguage
English
Publisher
ieee
Conference_Titel
Telecommunications (IST), 2010 5th International Symposium on
Conference_Location
Tehran
Print_ISBN
978-1-4244-8183-5
Type
conf
DOI
10.1109/ISTEL.2010.5734118
Filename
5734118
Link To Document