DocumentCode
2138774
Title
An improved referrer-based session identification algorithm using MapReduce
Author
Peng Huang ; Dehua Chen ; Jiajin Le
Author_Institution
Sch. of Comput. Sci. & Technol., Donghua Univ., Shanghai, China
fYear
2013
fDate
23-25 July 2013
Firstpage
1072
Lastpage
1076
Abstract
Session identification is an important process in web log mining for predictive prefetching of users´ next request based on their navigation behavior. However, there are mainly two challenges towards this problem: one is how to effectively deal with the huge dataset and the other is how to accurately identify user´s session boundaries. To meet the challenges, we proposed a novel session identification algorithm which combines the time based algorithm with the referrer based algorithm and implemented it in the popular MapReduce framework on Hadoop platform to achieve higher performance. Experimental evidence using real-world data reveals that, compared to the traditional session identification methods, the algorithm we proposed is more effective and can identify more long sessions which makes it achieve a higher accuracy.
Keywords
Internet; data mining; distributed processing; Hadoop platform; MapReduce; Web log mining; improved referrer-based session identification algorithm; navigation behavior; referrer based algorithm; time based algorithm; Accuracy; Algorithm design and analysis; Clustering algorithms; Computers; Data mining; Data preprocessing; Educational institutions; Data preprocessing; Hadoop; MapReduce; Session identification; Web log mining;
fLanguage
English
Publisher
ieee
Conference_Titel
Natural Computation (ICNC), 2013 Ninth International Conference on
Conference_Location
Shenyang
Type
conf
DOI
10.1109/ICNC.2013.6818136
Filename
6818136
Link To Document