Title :
Preprocessing web logs: A critical phase in web usage mining
Author :
Goel, Neha ; Jha, C.K.
Author_Institution :
Banasthali Vidyapith, Banasthali, India
Abstract :
Web usage mining refers to finding out user access patterns from the web logs of a Website. The Web logs obtained are highly unstructured and this very nature of Web logs makes them unsuitable for mining directly. Hence they go through a stage called preprocessing which not only makes them suitable for analysis but reduces the file size significantly. This paper explores this preprocessing phase in detail and proposes a total and absolute tool for the same which reduces the irrelevant and noisy data and transforms it into a form so that it can be readily used for analysis. The tool has been referred to as total and absolute as after cleaning the data it shows us a summary statistics of the records at the end once they have been preprocessed. The summary statistics highlights the number of records fed as input, elements obtained after carrying out preprocessing and the time utilized in accomplishing the task. Finally it exports the preprocessed data obtained into a .log file which can be very easily imported in any data mining utility. The features of summary statistics and export data can be considered as a distinguishing feature from the other tools which have been proposed earlier.
Keywords :
Web sites; data mining; Web logs preprocessing; Web usage mining; Website; data mining; summary statistics; Algorithm design and analysis; Cleaning; Computers; Data mining; Feature extraction; IP networks; Robots; data; preprocessing; users; web logs; web usage mining;
Conference_Titel :
Computer Engineering and Applications (ICACEA), 2015 International Conference on Advances in
Conference_Location :
Ghaziabad
DOI :
10.1109/ICACEA.2015.7164776