DocumentCode :
2989120
Title :
An improved topic relevance algorithm for vertical search engines
Author :
Lv, Lin-tao ; Chen, Li-ping ; Zhou, Hong-fang
Author_Institution :
Inst. of Comput. Sci. & Eng., Xian Univ. of Technol., Xian
Volume :
2
fYear :
2008
fDate :
30-31 Aug. 2008
Firstpage :
753
Lastpage :
757
Abstract :
HITS algorithm is a famous topic distillation algorithm, but it has a drawback of topic drift. To tackle this problem, a new improved HITS algorithm is proposed by assigning appropriate weights to links according to the link value and topic similarity. Based on an analysis of web link structure, link value is calculated by web page authority degree; topic similarity of web pages is calculated by combining analysis of page content with HTML structure characteristics. Improved HITS algorithm combining link value with topic similarity highlights the difference of links and it assigns different weights to different links. Experiment results indicate that the proposed HITS algorithm can improve the relevance ratio by 13%-42%. Furthermore it can well control topic drift and enhance the accuracy of information collection. The proposed HITS algorithm can be applied in vertical search engines. It lays an important theoretical foundation for vertical search engines.
Keywords :
Web sites; hypermedia markup languages; relevance feedback; search engines; HITS algorithm; HTML structure characteristics; Web link structure; Web pages; hyperlink induced topic search; improved topic relevance algorithm; vertical search engines; Algorithm design and analysis; Computer science; Electronic mail; HTML; Information resources; Pattern analysis; Pattern recognition; Search engines; Wavelet analysis; Web pages; HITS; Hyperlink; Link Value; Topic Drift; Topic Similarity;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Wavelet Analysis and Pattern Recognition, 2008. ICWAPR '08. International Conference on
Conference_Location :
Hong Kong
Print_ISBN :
978-1-4244-2238-8
Electronic_ISBN :
978-1-4244-2239-5
Type :
conf
DOI :
10.1109/ICWAPR.2008.4635878
Filename :
4635878
Link To Document :
بازگشت