Title :
Using Web Search Logs to Identify Query Classification Terms
Author :
Taksa, Isak ; Zelikovitz, Sarah ; Spink, Amanda
Author_Institution :
Baruch Coll., City Univ. of New York, NY
Abstract :
Classification of search queries is a complex and computationally challenging task. Typically, search queries are short, reveal very few features per single query and are therefore a weak source for traditional machine learning. In this paper, we present a method that combines limited manual labeling, computational linguistics and information retrieval to classify a large collection of Web search queries. A short set of manually chosen terms that are known a priori to be of interest to a particular class is used to cull a small number of actual queries from a commercial search engine log. These queries are then submitted to a commercial search engine and the returned search results are used to find more class related terms. We examine classification proficiency of the proposed method on a large Web search engine query log and show that up to 48% of the unlabeled set could be classified using this method. We discuss results of this research and its implications on the advancement of short text classification
Keywords :
Internet; classification; computational linguistics; query processing; search engines; Web search engine; Web search logs; Web search queries; computational linguistics; information retrieval; limited manual labeling; query classification terms; short text classification; Australia; Computational linguistics; Educational institutions; Information technology; Labeling; Machine learning; Machine learning algorithms; Search engines; Text categorization; Web search; labeled sets; machine learning; short; text classification; web search logs;
Conference_Titel :
Information Technology, 2007. ITNG '07. Fourth International Conference on
Conference_Location :
Las Vegas, NV
Print_ISBN :
0-7695-2776-0
DOI :
10.1109/ITNG.2007.202