Title :
Web usage mining: A survey on preprocessing of web log file
Author :
Hussain, Tasawar ; Asghar, Sohail ; Masood, Nayyer
Author_Institution :
Dept. of Comput. Sci., Muhammad Ali Jinnah Univ., Islamabad, Pakistan
Abstract :
Web applications are increasing at an enormous speed and its users are increasing at exponential speed. The evolutionary changes in technology have made it possible to capture the users´ essence and interactions with web applications through web server log file. Web log file is saved as text (.txt) file. Due to large amount of “irrelevant information” in the web log, the original log file can not be directly used in the web usage mining (WUM) procedure. Therefore the preprocessing of web log file becomes imperative. The proper analysis of web log file is beneficial to manage the web sites effectively for administrative and users´ prospective. Web log preprocessing is initial necessary step to improve the quality and efficiency of the later steps of WUM. There are number of techniques available at preprocessing level of WUM. Different techniques are applied at preprocessing level such as data cleaning, data filtering, and data integration. In this paper, we will survey the preprocessing techniques to identify the issues and how WUM preprocessing can be improved for pattern mining and analysis.
Keywords :
Internet; data mining; data cleaning; data filtering; data integration; exponential speed; irrelevant information; pattern analysis; pattern mining; web applications; web log file; web log file preprocessing; web usage mining; Browsers; Cleaning; Data mining; Filtering; IP networks; Servers; Web sites; Data Mining; Preprocessing; Web Usage Mining;
Conference_Titel :
Information and Emerging Technologies (ICIET), 2010 International Conference on
Conference_Location :
Karachi
Print_ISBN :
978-1-4244-8001-2
DOI :
10.1109/ICIET.2010.5625730