DocumentCode :
2864456
Title :
Summarization - compressing data into an informative representation
Author :
Chandola, Varun ; Kumar, Vipin
Author_Institution :
Dept. of Comput. Sci., Minnesota Univ., Minneapolis, MN, USA
fYear :
2005
fDate :
27-30 Nov. 2005
Abstract :
In this paper, we formulate the problem of summarization of a dataset of transactions with categorical attributes as an optimization problem involving two objective functions - compaction gain and information loss. We propose metrics to characterize the output of any summarization algorithm. We investigate two approaches to address this problem. The first approach is an adaptation of clustering and the second approach makes use of frequent item sets from the association analysis domain. We illustrate one application of summarization in the field of network data where we show how our technique can be effectively used to summarize network traffic into a compact but meaningful representation. Specifically, we evaluate our proposed algorithms on the 1998 DARPA Off-line Intrusion Detection Evaluation data and network data generated by SKAION Corp for the ARDA information assurance program.
Keywords :
data compression; data mining; optimisation; transaction processing; association analysis; compaction gain; data summarization; frequent item sets; information loss; informative representation; objective function; optimization problem; transaction data; Clustering algorithms; Compaction; Computer science; Data analysis; Data mining; Data visualization; Intrusion detection; Itemsets; Monitoring; Telecommunication traffic;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining, Fifth IEEE International Conference on
ISSN :
1550-4786
Print_ISBN :
0-7695-2278-5
Type :
conf
DOI :
10.1109/ICDM.2005.137
Filename :
1565667
Link To Document :
بازگشت