DocumentCode :
3308881
Title :
Clustering Based URL Normalization Technique for Web Mining
Author :
Nagwani, Naresh Kumar
Author_Institution :
Dept. Of CSE, NIT Raipur, Raipur, India
fYear :
2010
fDate :
20-21 June 2010
Firstpage :
349
Lastpage :
351
Abstract :
URL (Uniform Resource Locator) normalization is an important activity in web mining. Web data can be retrieved in smoother way using effective URL normalization technique. URL normalization also reduces lot of calculations in web mining activities. A web mining technique for URL normalization is proposed in this paper. The proposed technique is based on content, structure and semantic similarity and web page redirection and forwarding similarity of the given set of URLs. Web page redirection and forward graphs can be used to measure the similarities between the URL’s and can also be used for URL clusters. The URL clusters can be used for URL normalization. A data structure is also suggested to store the forward and redirect URL information.
Keywords :
Access protocols; Crawlers; Data structures; Indexing; Information retrieval; Search engines; Uniform resource locators; Web mining; Web pages; World Wide Web; Clustering; URL Normalization; Web Page Forward and Redirect Similarity Tree;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Advances in Computer Engineering (ACE), 2010 International Conference on
Conference_Location :
Bangalore, Karnataka, India
Print_ISBN :
978-1-4244-7154-6
Type :
conf
DOI :
10.1109/ACE.2010.47
Filename :
5532806
Link To Document :
بازگشت