DocumentCode
3120903
Title
A novel post-classifier for search engine using hidden Markov model
Author
Liao, Zhi-wu ; Lam, Ernest C m ; Tang, Yuan Y.
Author_Institution
Dept. of Comput. Sci., Hong Kong Baptist Univ., China
Volume
4
fYear
2002
fDate
4-5 Nov. 2002
Firstpage
1926
Abstract
Hidden Markov models (HHMs), while well applied in fields such as speech recognition and optical character recognition, have not been used in post-classification for search engines. We explore the use of HMMs for optimization of search engines tasks, specifically focusing on how to construct a new model structure to improve the classification of web pages. We show that a manually constructed new structure model that contains only two states and two classes of observations per field can produce good classification results, and discuss strategies for learning the model structure automatically from data. We also demonstrate that the use of new structure model to classify the search results using some search engines and some different search keywords provide a significant improvement in search accuracy. Our models are applied to the task of post-classifying the web pages selected by the search engine Google, and achieve a classification accuracy of 93.4.
Keywords
hidden Markov models; pattern classification; search engines; HUMs; Hidden Markov models; World Wide Web; classification; classification accuracy; model structure; search engine; web pages; Computer science; Data mining; Electronic mail; HTML; Hidden Markov models; Search engines; Speech recognition; Web pages; Web sites; World Wide Web;
fLanguage
English
Publisher
ieee
Conference_Titel
Machine Learning and Cybernetics, 2002. Proceedings. 2002 International Conference on
Print_ISBN
0-7803-7508-4
Type
conf
DOI
10.1109/ICMLC.2002.1175373
Filename
1175373
Link To Document