DocumentCode :
3592105
Title :
Data Reconstruction of Abandoned Websites
Author :
Fister, Iztok ; Fister, Iztok ; Fong, Simon ; Yan Zhuang
Author_Institution :
Fac. of Electr. Eng. & Comput. Sci., Univ. of Maribor, Maribor, Slovenia
fYear :
2014
Firstpage :
67
Lastpage :
72
Abstract :
Nowadays, the Internet offers data to anyone at any time. Websites on the Internet have been warehousing data for many years ago, i.e., for 10 years and more. In the meantime, many websites have became obsolete. This means they no longer have owner because of either they have no-one to maintain them or they have become unavailable for indexing by spiders that retrieves information about documents to be referenced. As a result, these websites are lost for accessing from Internet browsers and are therefore, referred to as abandoned websites. This paper focuses on the problem of how to identify the abandoned websites and how to preserve and reconstruct the data they hold. We have mainly concentrated on abandoned sport websites that, in general, contains very important data about the results achieved at various sporting competitions in the past. The proposed solution consist of four steps: an analysis of the abandoned servers that held these websites, identifying the structure of the abandoned web page sets, web scrapping, and preserving and visualizing these page sets. In order to test prototype solution, some steps were applied in order to reconstruct and preserve the data on the abandoned web servers for tracking the results on running. Additionally, opportunities and challenges of applying data mining techniques on reconstructed website are listed.
Keywords :
Internet; Web sites; data handling; data mining; Internet browsers; Web scrapping; Web site reconstruction; abandoned Web page sets; abandoned servers; abandoned sport Web sites; data mining; data reconstruction; data warehousing; indexing; sporting competitions; Browsers; Data mining; HTML; Indexes; Internet; Servers; Web pages; Internet; abandoned websites; data mining; reconstruction; web scrapping;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational and Business Intelligence (ISCBI), 2014 2nd International Symposium on
Print_ISBN :
978-1-4799-7551-8
Type :
conf
DOI :
10.1109/ISCBI.2014.22
Filename :
7119536
Link To Document :
بازگشت