Title :
Web Host Access Tool: A Support Vector Machine Approach
Author :
Banerjee, Satarupa ; Cassel, Lillian
Author_Institution :
Villanova Univ., Villanova
Abstract :
Search engines are an integral part of web information gathering and retrieval process, but they are significantly dependent upon the words or word phrases input by the end user. The search engine usually contributes no additional semantic data toward the information acquisition process. This paper presents an intelligent search agent "Web host access tool" (WHAT) based on support vector machines (SVM), which introduces the notion of queries conducted within a specific contextual meaning. Given a context and associated keywords that personalize the search history and preferences of the user, WHAT performs more intelligent resource filtering than conventional search engines, providing more relevant results while filtering the irrelevant references. Search results obtained in the form of text from different search engines are processed by the SVM based word classifier that arrange the results obtained according to user preference obtained from previous search processes. The text materials are processed by Latent Semantic Indexing (LSI) for creating a document matrix that gives the probability of a word occurrence in a specified context. For simplicity, this paper considers 5 different contexts: business, education, entertainment, news and information and tourism. The LSI coefficients were used by SVM to yield confidence levels for each search result and according to that the results were sorted and presented to the end user. As an alternative, least square support vector machines (LS-SVM) are also studied in this paper. The system is updated by the user that provides feedback regarding the search relevance, based on which the LSI coefficients are updated to ensure increased relevance in future searches. Three different types of kernel function were considered in this paper -Linear. Radial basis function (RBF) and polynomial kernel. Results claim that Linear SVM perform better than the others, not only in terms of classification accuracy but also in terms of training s- peed.
Keywords :
document handling; search engines; support vector machines; Web host access tool; Web information gathering; Web information retrieval process; document matrix; intelligent resource filtering; intelligent search agent; latent semantic indexing; least square support vector machine; linear radial basis function; polynomial kernel; search engines; Filtering; History; Information retrieval; Intelligent agent; Kernel; Large scale integration; Machine intelligence; Search engines; Support vector machine classification; Support vector machines;
Conference_Titel :
Neural Networks, 2006. IJCNN '06. International Joint Conference on
Conference_Location :
Vancouver, BC
Print_ISBN :
0-7803-9490-9
DOI :
10.1109/IJCNN.2006.246824