DocumentCode
3253044
Title
Algorithm for detecting dynamic webpage and its importance
Author
Sultania, A.K.
Author_Institution
Freescale Semicond. Pvt Ltd., Noida, India
fYear
2012
fDate
21-22 Dec. 2012
Firstpage
257
Lastpage
259
Abstract
During web search using crawling, indexing, relevance it is found that there exist many duplicate web-pages with different URLs, these URLs are normalized when used by crawler. Many web-pages are found to be dynamic, for which different web contents are found with the same URL, during different instances of searches. In this paper, we discuss about the necessity to detect these dynamic web-pages and propose an algorithm to identify this dynamism. The normalization of URLs can be done using various methods explained in [1], [2] & [7], or using the DUST algorithm [3] but it is necessary first to identify the dynamic web-page before normalization. After implementing the proposed algorithm with DUST rule it is expected that the detection rate of dynamic web-pages improves, resulting in reduction of the time spent for crawling, indexing etc.
Keywords
Web sites; indexing; information retrieval; DUST algorithm; URL; Web search; crawling; duplicate Web-pages; dynamic Webpage detection; indexing; Conferences; Heuristic algorithms; Indexing; Radar tracking; Search engines; Web search; World Wide Web; Search engine; URL normalization; Webpage de-duplication; duplicate detection; dynamic webpage;
fLanguage
English
Publisher
ieee
Conference_Titel
Radar, Communication and Computing (ICRCC), 2012 International Conference on
Conference_Location
Tiruvannamalai
Print_ISBN
978-1-4673-2756-5
Type
conf
DOI
10.1109/ICRCC.2012.6450590
Filename
6450590
Link To Document