مرکز منطقه ای اطلاع رساني علوم و فناوري - Greedy and Randomized Feature Selection for Web Search Ranking

DocumentCode :

3537521

Title :

Greedy and Randomized Feature Selection for Web Search Ranking

Author :

Pan, Feng ; Converse, Tim ; Ahn, David ; Salvetti, Franco ; Donato, Gianluca

Author_Institution :

Bing SF, Microsoft Corp., San Francisco, CA, USA

fYear :

2011

fDate :

Aug. 31 2011-Sept. 2 2011

Firstpage :

436

Lastpage :

442

Abstract :

Modern search engines have to be fast to satisfy users, so there are hard back-end latency requirements. The set of features useful for search ranking functions, though, continues to grow, making feature computation a latency bottleneck. As a result, not all available features can be used for ranking, and in fact, much of the time only a small percentage of these features can be used. Thus, it is crucial to have a feature selection mechanism that can find a subset of features that both meets latency requirements and achieves high relevance. To this end, we explore different feature selection methods using boosted regression trees, including both greedy approaches (i.e., selecting the features with the highest relative influence as computed by boosted trees, discounting importance by feature similarity) and randomized approaches (i.e., best-only genetic algorithm, a proposed more efficient randomized method with feature-importance-based backward elimination). We evaluate and compare these approaches using two data sets, one from a commercial Wikipedia search engine and the other from a commercial Web search engine. The experimental results show that the greedy approach that selects top features with the highest relative influence performs close to the full-feature model, and the randomized feature selection with feature-importance-based backward elimination outperforms all other randomized and greedy approaches, especially on the Wikipedia data.

Keywords :

Web sites; greedy algorithms; information retrieval; random processes; regression analysis; search engines; tree data structures; Web search engine; Wikipedia; backend latency requirements; boosted regression trees; data sets; full-feature model; greedy approach; random feature selection; search ranking functions; Data models; Electronic publishing; Encyclopedias; Feature extraction; Genetic algorithms; Internet; Feature Selection; Learning to Rank; Web Search;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Computer and Information Technology (CIT), 2011 IEEE 11th International Conference on

Conference_Location :

Pafos

Print_ISBN :

978-1-4577-0383-6

Electronic_ISBN :

978-0-7695-4388-8

Type :

conf

DOI :

10.1109/CIT.2011.16

Filename :

6036806

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3537521