DocumentCode :
3251737
Title :
webSPADE: a parallel sequence mining algorithm to analyze web log data
Author :
Demiriz, Ayhan
Author_Institution :
Inf. Technol., Verizon Inc., Irving, TX, USA
fYear :
2002
fDate :
2002
Firstpage :
755
Lastpage :
758
Abstract :
Enterprise-class web sites receive a large amount of traffic, from both registered and anonymous users. Data warehouses are built to store and help analyze the click streams within this traffic to provide companies with valuable insights into the behavior of their customers. This article proposes a parallel sequence mining algorithm, webSPADE, to analyze the click streams found in site web logs. In this process, raw web logs are first cleaned and inserted into a data warehouse. The click streams are then mined by webSPADE. An innovative web-based front-end is used to visualize and query the sequence mining results. The webSPADE algorithm is currently used by Verizon to analyze the daily traffic of the Verizon.com web site.
Keywords :
Web sites; data mining; Web log data; data warehouses; enterprise-class web sites; parallel sequence mining algorithm; raw web logs; sequence mining; web-based front-end; webSPADE; Algorithm design and analysis; Appropriate technology; Companies; Data analysis; Data visualization; Data warehouses; Frequency; Information technology; Relational databases; Service oriented architecture;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining, 2002. ICDM 2003. Proceedings. 2002 IEEE International Conference on
Print_ISBN :
0-7695-1754-4
Type :
conf
DOI :
10.1109/ICDM.2002.1184046
Filename :
1184046
Link To Document :
بازگشت