DocumentCode :
2226293
Title :
Topic continuity for Web document categorization and ranking
Author :
Narayan, B.L. ; Murthy, C.A. ; Pal, Sankar K.
Author_Institution :
Machine Intelligence Unit, Indian Stat. Inst., Kolkata, India
fYear :
2003
fDate :
13-17 Oct. 2003
Firstpage :
310
Lastpage :
315
Abstract :
PageRank is primarily based on link structure analysis. Recently, it has been shown that content information can be utilized to improve link analysis. We propose a novel algorithm that harnesses the information contained in the history of a surfer to determine the topic of interest on a given page. As the history is unavailable until query time, we guess it probabilistically so that the operations can be performed offline. This leads to a better Web page categorization and, thereby, to a better ranking of Web pages.
Keywords :
Web sites; citation analysis; search engines; PageRank; Web document categorization; Web page ranking; Web sites; citation analysis; content information; link structure analysis; search engines; Citation analysis; Content based retrieval; Frequency; History; Information analysis; Information retrieval; Machine intelligence; Search engines; Text analysis; Web pages;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Intelligence, 2003. WI 2003. Proceedings. IEEE/WIC International Conference on
Print_ISBN :
0-7695-1932-6
Type :
conf
DOI :
10.1109/WI.2003.1241209
Filename :
1241209
Link To Document :
بازگشت