DocumentCode :
3537521
Title :
Greedy and Randomized Feature Selection for Web Search Ranking
Author :
Pan, Feng ; Converse, Tim ; Ahn, David ; Salvetti, Franco ; Donato, Gianluca
Author_Institution :
Bing SF, Microsoft Corp., San Francisco, CA, USA
fYear :
2011
fDate :
Aug. 31 2011-Sept. 2 2011
Firstpage :
436
Lastpage :
442
Abstract :
Modern search engines have to be fast to satisfy users, so there are hard back-end latency requirements. The set of features useful for search ranking functions, though, continues to grow, making feature computation a latency bottleneck. As a result, not all available features can be used for ranking, and in fact, much of the time only a small percentage of these features can be used. Thus, it is crucial to have a feature selection mechanism that can find a subset of features that both meets latency requirements and achieves high relevance. To this end, we explore different feature selection methods using boosted regression trees, including both greedy approaches (i.e., selecting the features with the highest relative influence as computed by boosted trees, discounting importance by feature similarity) and randomized approaches (i.e., best-only genetic algorithm, a proposed more efficient randomized method with feature-importance-based backward elimination). We evaluate and compare these approaches using two data sets, one from a commercial Wikipedia search engine and the other from a commercial Web search engine. The experimental results show that the greedy approach that selects top features with the highest relative influence performs close to the full-feature model, and the randomized feature selection with feature-importance-based backward elimination outperforms all other randomized and greedy approaches, especially on the Wikipedia data.
Keywords :
Web sites; greedy algorithms; information retrieval; random processes; regression analysis; search engines; tree data structures; Web search engine; Wikipedia; backend latency requirements; boosted regression trees; data sets; full-feature model; greedy approach; random feature selection; search ranking functions; Data models; Electronic publishing; Encyclopedias; Feature extraction; Genetic algorithms; Internet; Feature Selection; Learning to Rank; Web Search;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer and Information Technology (CIT), 2011 IEEE 11th International Conference on
Conference_Location :
Pafos
Print_ISBN :
978-1-4577-0383-6
Electronic_ISBN :
978-0-7695-4388-8
Type :
conf
DOI :
10.1109/CIT.2011.16
Filename :
6036806
Link To Document :
بازگشت