Title :
Finding structurally different medical data
Author :
Lin, Jessica ; Li, Yuan
Author_Institution :
Comput. Sci. Dept., George Mason Univ., Fairfax, VA, USA
Abstract :
For more than one decade, time series similarity search has been given a great deal of attention by data mining researchers. As a result, many time series representations and distance measures have been proposed. However, most existing work on time series similarity search focuses on finding shape-based similarity. While some of the existing approaches work well for short time series data, they typically fail to produce satisfactory results when the sequence is long. For long sequences, it is more appropriate to consider the similarity based on the higher-level structures. This is particularly true for medical time series, as they often are not perfectly aligned. In this work, we present a histogram-based representation for time series data, similar to the "bag of words" approach that is widely accepted by the text mining and information retrieval communities. We show that our approach outperforms the existing methods in clustering and classification on medical time series obtained from PhysioBank.
Keywords :
data mining; data structures; medical information systems; pattern classification; pattern clustering; time series; PhysioBank; data mining; distance measure; histogram-based time series data representation; information retrieval; medical time series; pattern classification; pattern clustering; shape-based similarity search; text mining; Computer science; Data mining; Dynamic programming; Electrocardiography; Euclidean distance; Information retrieval; Robustness; Shape; Text mining; Time measurement;
Conference_Titel :
Computer-Based Medical Systems, 2009. CBMS 2009. 22nd IEEE International Symposium on
Conference_Location :
Albuquerque, NM
Print_ISBN :
978-1-4244-4879-1
Electronic_ISBN :
1063-7125
DOI :
10.1109/CBMS.2009.5255269