• DocumentCode
    3251737
  • Title

    webSPADE: a parallel sequence mining algorithm to analyze web log data

  • Author

    Demiriz, Ayhan

  • Author_Institution
    Inf. Technol., Verizon Inc., Irving, TX, USA
  • fYear
    2002
  • fDate
    2002
  • Firstpage
    755
  • Lastpage
    758
  • Abstract
    Enterprise-class web sites receive a large amount of traffic, from both registered and anonymous users. Data warehouses are built to store and help analyze the click streams within this traffic to provide companies with valuable insights into the behavior of their customers. This article proposes a parallel sequence mining algorithm, webSPADE, to analyze the click streams found in site web logs. In this process, raw web logs are first cleaned and inserted into a data warehouse. The click streams are then mined by webSPADE. An innovative web-based front-end is used to visualize and query the sequence mining results. The webSPADE algorithm is currently used by Verizon to analyze the daily traffic of the Verizon.com web site.
  • Keywords
    Web sites; data mining; Web log data; data warehouses; enterprise-class web sites; parallel sequence mining algorithm; raw web logs; sequence mining; web-based front-end; webSPADE; Algorithm design and analysis; Appropriate technology; Companies; Data analysis; Data visualization; Data warehouses; Frequency; Information technology; Relational databases; Service oriented architecture;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, 2002. ICDM 2003. Proceedings. 2002 IEEE International Conference on
  • Print_ISBN
    0-7695-1754-4
  • Type

    conf

  • DOI
    10.1109/ICDM.2002.1184046
  • Filename
    1184046