DocumentCode
3251737
Title
webSPADE: a parallel sequence mining algorithm to analyze web log data
Author
Demiriz, Ayhan
Author_Institution
Inf. Technol., Verizon Inc., Irving, TX, USA
fYear
2002
fDate
2002
Firstpage
755
Lastpage
758
Abstract
Enterprise-class web sites receive a large amount of traffic, from both registered and anonymous users. Data warehouses are built to store and help analyze the click streams within this traffic to provide companies with valuable insights into the behavior of their customers. This article proposes a parallel sequence mining algorithm, webSPADE, to analyze the click streams found in site web logs. In this process, raw web logs are first cleaned and inserted into a data warehouse. The click streams are then mined by webSPADE. An innovative web-based front-end is used to visualize and query the sequence mining results. The webSPADE algorithm is currently used by Verizon to analyze the daily traffic of the Verizon.com web site.
Keywords
Web sites; data mining; Web log data; data warehouses; enterprise-class web sites; parallel sequence mining algorithm; raw web logs; sequence mining; web-based front-end; webSPADE; Algorithm design and analysis; Appropriate technology; Companies; Data analysis; Data visualization; Data warehouses; Frequency; Information technology; Relational databases; Service oriented architecture;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining, 2002. ICDM 2003. Proceedings. 2002 IEEE International Conference on
Print_ISBN
0-7695-1754-4
Type
conf
DOI
10.1109/ICDM.2002.1184046
Filename
1184046
Link To Document