• DocumentCode
    2761482
  • Title

    Application of ensemble models in web ranking

  • Author

    Hashemi, Homa Baradaran ; Yazdani, Nasser ; Shakery, Azadeh ; Naeini, Mahdi Pakdaman

  • Author_Institution
    Sch. of Electr. & Comput. Eng., Univ. of Tehran, Tehran, Iran
  • fYear
    2010
  • fDate
    4-6 Dec. 2010
  • Firstpage
    726
  • Lastpage
    731
  • Abstract
    One of the most important parts of search engines is the ranking unit. Many different classical ranking algorithms based on content (such as TF-IDF and BM25) and connectivity (such as HITS and PageRank) have been used in web search engines to find pages in response to a user query. Although these algorithms have been developed to improve retrieval results, none of them can take advantage of power of contents as well as useful link structures. Thus, it remains a challenging research question how to effectively combine these available information to maximize search accuracy. In this study, we investigate the application of different ensemble models in ranking algorithms. Some of them are simple such as Sum, Product and Borda rule, and the others are more complicated methods. We present three complex ensemble approaches. The first one is OWA operator to merge the results of various ranking algorithms. In the second approach, a state-of-the-art method, simulated click-through data, is used to learn how to combine many content and connectivity features of web pages. Moreover, we present a modified version of SVM classifier customized for ranking problems as the third complex fusion approach. The proposed methods are evaluated using the LETOR and dotIR benchmark data sets. The experimental results show that in most of the cases ensemble methods give better results and the improvements are very encouraging. These results also show that the OWA and SVM fusion methods are promising respect to other ensemble models.
  • Keywords
    pattern classification; search engines; sensor fusion; support vector machines; BM25 ranking algorithm; Borda rule ensemble model; HITS ranking algorithm; PageRank ranking algorithm; Product ensemble model; SVM classifier; SVM fusion methods; Sum ensemble model; TF-IDF ranking algorithm; Web pages; Web ranking; ordered weighted avergaing operator; search engines; simulated click-through data approach; support vector machine; user query; Accuracy; Benchmark testing; Data models; Feature extraction; Open wireless architecture; Support vector machines; Web pages; Document Feature Combination; Ordered Weighted Averaging; Simulated Click-through Data; Support Vector Machine; Web Ranking;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Telecommunications (IST), 2010 5th International Symposium on
  • Conference_Location
    Tehran
  • Print_ISBN
    978-1-4244-8183-5
  • Type

    conf

  • DOI
    10.1109/ISTEL.2010.5734118
  • Filename
    5734118