Title :
A framework towards efficient and effective sequence clustering
Author :
Wang, Wei ; Yang, Jiong
Author_Institution :
IBM Thomas J. Watson Res. Center, NY, USA
Abstract :
Analyzing sequence data (particularly in categorical domains) has become increasingly important, partially due to the significant advances in biology and other fields. Examples of sequence data include DNA sequences, unfolded protein sequences, text documents, Web usage data, system traces, etc. Previous work on mining sequence data has mainly focused on frequent pattern discovery. In this project, we focus on the problem of clustering sequence data
Keywords :
data analysis; pattern clustering; sequences; DNA sequences; Web usage data; categorical domains; sequence data analysis; sequence data clustering; system traces; text documents; unfolded protein sequences; Amino acids; Biological information theory; Clustering algorithms; DNA; Data analysis; Data mining; Extraterrestrial measurements; Probability distribution; Protein sequence; Tree data structures;
Conference_Titel :
Data Engineering, 2002. Proceedings. 18th International Conference on
Conference_Location :
San Jose, CA
Print_ISBN :
0-7695-1531-2
DOI :
10.1109/ICDE.2002.994736