DocumentCode :
3120903
Title :
A novel post-classifier for search engine using hidden Markov model
Author :
Liao, Zhi-wu ; Lam, Ernest C m ; Tang, Yuan Y.
Author_Institution :
Dept. of Comput. Sci., Hong Kong Baptist Univ., China
Volume :
4
fYear :
2002
fDate :
4-5 Nov. 2002
Firstpage :
1926
Abstract :
Hidden Markov models (HHMs), while well applied in fields such as speech recognition and optical character recognition, have not been used in post-classification for search engines. We explore the use of HMMs for optimization of search engines tasks, specifically focusing on how to construct a new model structure to improve the classification of web pages. We show that a manually constructed new structure model that contains only two states and two classes of observations per field can produce good classification results, and discuss strategies for learning the model structure automatically from data. We also demonstrate that the use of new structure model to classify the search results using some search engines and some different search keywords provide a significant improvement in search accuracy. Our models are applied to the task of post-classifying the web pages selected by the search engine Google, and achieve a classification accuracy of 93.4.
Keywords :
hidden Markov models; pattern classification; search engines; HUMs; Hidden Markov models; World Wide Web; classification; classification accuracy; model structure; search engine; web pages; Computer science; Data mining; Electronic mail; HTML; Hidden Markov models; Search engines; Speech recognition; Web pages; Web sites; World Wide Web;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Cybernetics, 2002. Proceedings. 2002 International Conference on
Print_ISBN :
0-7803-7508-4
Type :
conf
DOI :
10.1109/ICMLC.2002.1175373
Filename :
1175373
Link To Document :
بازگشت