• DocumentCode
    2138774
  • Title

    An improved referrer-based session identification algorithm using MapReduce

  • Author

    Peng Huang ; Dehua Chen ; Jiajin Le

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Donghua Univ., Shanghai, China
  • fYear
    2013
  • fDate
    23-25 July 2013
  • Firstpage
    1072
  • Lastpage
    1076
  • Abstract
    Session identification is an important process in web log mining for predictive prefetching of users´ next request based on their navigation behavior. However, there are mainly two challenges towards this problem: one is how to effectively deal with the huge dataset and the other is how to accurately identify user´s session boundaries. To meet the challenges, we proposed a novel session identification algorithm which combines the time based algorithm with the referrer based algorithm and implemented it in the popular MapReduce framework on Hadoop platform to achieve higher performance. Experimental evidence using real-world data reveals that, compared to the traditional session identification methods, the algorithm we proposed is more effective and can identify more long sessions which makes it achieve a higher accuracy.
  • Keywords
    Internet; data mining; distributed processing; Hadoop platform; MapReduce; Web log mining; improved referrer-based session identification algorithm; navigation behavior; referrer based algorithm; time based algorithm; Accuracy; Algorithm design and analysis; Clustering algorithms; Computers; Data mining; Data preprocessing; Educational institutions; Data preprocessing; Hadoop; MapReduce; Session identification; Web log mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Natural Computation (ICNC), 2013 Ninth International Conference on
  • Conference_Location
    Shenyang
  • Type

    conf

  • DOI
    10.1109/ICNC.2013.6818136
  • Filename
    6818136