• DocumentCode
    1561787
  • Title

    Characterizing Web user accesses: a transactional approach to Web log clustering

  • Author

    Giannotti, Fosca ; Gozzi, Cristian ; Manco, Giuseppe

  • Author_Institution
    Ist. CNUCE, CNR, Pisa, Italy
  • fYear
    2002
  • Firstpage
    312
  • Lastpage
    317
  • Abstract
    We present a partitioning method able to manage Web log sessions. Sessions are assimilable to transactions, i.e., tuples of variable size of categorical data. We adapt the standard definition of mathematical distance used in the K-Means algorithm to represent transactions dissimilarity, and redefine the notion of cluster centroid. The cluster centroid is used as the representative of the common properties of cluster elements. We show that using our concept of cluster centroid together with Jaccard distance we obtain results that are comparable with standard approaches, but substantially improve their efficiency.
  • Keywords
    Internet; information resources; information retrieval; Internet; Jaccard distance; K-Means algorithm; Web log clustering; Web log sessions; Web user access; categorical data; cluster centroid; partitioning method; transactions; Clustering algorithms; Data analysis; Databases; Information technology; Iterative algorithms; Partitioning algorithms; Scalability; Standards development; Web pages;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Technology: Coding and Computing, 2002. Proceedings. International Conference on
  • Print_ISBN
    0-7695-1506-1
  • Type

    conf

  • DOI
    10.1109/ITCC.2002.1000408
  • Filename
    1000408