DocumentCode :
3558947
Title :
A Parameterized Approach to Spam-Resilient Link Analysis of the Web
Author :
Caverlee, James ; Webb, Steve ; Liu, Ling ; Rouse, William B.
Author_Institution :
Dept. of Comput. Sci., Texas A&M Univ., College Station, TX, USA
Volume :
20
Issue :
10
fYear :
2009
Firstpage :
1422
Lastpage :
1438
Abstract :
Link-based analysis of the Web provides the basis for many important applications-like Web search, Web-based data mining, and Web page categorization-that bring order to the massive amount of distributed Web content. Due to the overwhelming reliance on these important applications, there is a rise in efforts to manipulate (or spam) the link structure of the Web. In this manuscript, we present a parameterized framework for link analysis of the Web that promotes spam resilience through a source-centric view of the Web. We provide a rigorous study of the set of critical parameters that can impact source-centric link analysis and propose the novel notion of influence throttling for countering the influence of link-based manipulation. Through formal analysis and a large-scale experimental study, we show how different parameter settings may impact the time complexity, stability, and spam resilience of Web link analysis. Concretely, we find that the source-centric model supports more effective and robust rankings in comparison with existing Web algorithms such as PageRank.
Keywords :
Internet; Web sites; data mining; stability; unsolicited e-mail; PageRank; Web algorithms; Web link analysis; Web page categorization; Web search; Web-based data mining; World Wide Web; distributed Web content; formal analysis; link-based manipulation; source-centric link analysis; spam resilience; spam-resilient link analysis; stability; time complexity; Distributed systems; Internet search; Web Search; Web search; Web-based services; distributed systems; general; information search and retrieval; information storage and retrieval; information technology and systems; online information services.; systems and software;
fLanguage :
English
Journal_Title :
Parallel and Distributed Systems, IEEE Transactions on
Publisher :
ieee
Conference_Location :
10/17/2008 12:00:00 AM
ISSN :
1045-9219
Type :
jour
DOI :
10.1109/TPDS.2008.227
Filename :
4653482
Link To Document :
بازگشت