Title :
Optimized data preprocessing technology for web log mining
Author :
Zheng, Ling ; Gui, Hui ; Li, Feng
Author_Institution :
Sch. of Control & Comput. Eng., North China Electr. Power Univ., Beijing, China
Abstract :
In order to solve some existing problems in traditional data preprocessing technology for web log mining, an improved data preprocessing technology is used in this article. The identification strategy based on the referred web page is adopted at the stage of user identification, which is more effective than the traditional one based on web site topology. At stage of Session Identification, the strategy based on fixed priori threshold combined with session reconstruction is introduced. First, the initial session set is developed by the method of fixed priori threshold, and then the initial session set is optimized by using session reconstruction. Experiments have proved that advanced data preprocessing technology can enhance the quality of data preprocessing results.
Keywords :
Internet; data mining; Web log mining; fixed priori threshold; optimized data preprocessing technology; session identification; session reconstruction; user identification; Cleaning; Control engineering computing; Data engineering; Data mining; Data preprocessing; Design engineering; Power engineering and energy; Power engineering computing; Topology; Web server; User Identification; session; threshold; web log mining;
Conference_Titel :
Computer Design and Applications (ICCDA), 2010 International Conference on
Conference_Location :
Qinhuangdao
Print_ISBN :
978-1-4244-7164-5
Electronic_ISBN :
978-1-4244-7164-5
DOI :
10.1109/ICCDA.2010.5540924