DocumentCode :
2974034
Title :
Novel pre-processing technique for web log mining by removing global noise and web robots
Author :
Nithya, P. ; Sumathi, P.
Author_Institution :
Manonmaniam Sundaranar Univ., Tirunelveli, India
fYear :
2012
fDate :
21-22 Nov. 2012
Firstpage :
1
Lastpage :
5
Abstract :
Today internet has made the life of human dependent on it. Almost everything and anything can be searched on net. Web pages usually contain huge amount of information that may not interest the user, as it may not be the part of the main content of the web page. Web Usage Mining (WUM) is one of the main applications of data mining, artificial intelligence and so on to the web data and forecast the user´s visiting behaviors and obtains their interests by investigating the samples. Since WUM directly involves in applications, such as, e-commerce, e-learning, Web analytics, information retrieval etc. Weblog data is one of the major sources which contain all the information regarding the users visited links, browsing patterns, time spent on a particular page or link and this information can be used in several applications like adaptive web sites, modified services, customer summary, pre-fetching, generate attractive web sites etc. There are varieties of problems related with the existing web usage mining approaches. Existing web usage mining algorithms suffer from difficulty of practical applicability. This paper continues the line of research on Web access log analysis is to analyze the patterns of web site usage and the features of users behavior. It is the fact that the normal Log data is very noisy and unclear and it is vital to preprocess the log data for efficient web usage mining process. Preprocessing is the process comprises of three phases which includes data cleaning, user identification, and pattern discovery and pattern analysis. Log data is characteristically noisy and unclear, so preprocessing is an essential process for effective mining process. In this paper, a novel pre-processing technique is proposed by removing local and global noise and web robots. Preprocessing is an important step since the Web architecture is very complex in nature and 80% of the mining process is done at this phase. Anonymous Microsoft Web Dataset and MSNBC.com Anonymous Web D- taset are used for evaluating the proposed preprocessing technique.
Keywords :
Internet; Web sites; data analysis; data mining; human computer interaction; user interfaces; Internet; MSNBC.com anonymous Web dataset; WUW; Web access log analysis; Web analytics; Web architecture; Web data; Web log mining; Web page; Web robots removal; Web site usage pattern analysis; Web usage mining algorithms; Weblog data; adaptive Web sites; anonymous Microsoft Web dataset; artificial intelligence; attractive Web site generation; customer summary; data cleaning; data mining; e-commerce; e-learning; global noise removal; information retrieval; modified services; pattern analysis; pattern discovery; prefetching; preprocessing technique; user identification; users behavior feature analysis; Cleaning; Data mining; Graphics; Noise; Robots; Web pages; Content Path Set; Data Cleaning; Path Completion; Preprocessing; Travel Path set;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computing and Communication Systems (NCCCS), 2012 National Conference on
Conference_Location :
Durgapur
Print_ISBN :
978-1-4673-1952-2
Type :
conf
DOI :
10.1109/NCCCS.2012.6412976
Filename :
6412976
Link To Document :
بازگشت