Title :
Unsupervised segmentation of categorical time series into episodes
Author :
Cohen, Paul ; Heeringa, Brent ; Adams, Niall
Author_Institution :
Dept. of Comput. Sci., Massachusetts Univ., Amherst, MA, USA
Abstract :
This paper describes an unsupervised algorithm for segmenting categorical time series into episodes. The VOTING-EXPERTS algorithm first collects statistics about the frequency and boundary entropy of ngrams, then passes a window over the series and has two "expert methods" decide where in the window boundaries should be drawn. The algorithm successfully segments text into words in four languages. The algorithm also segments time series of robot sensor data into subsequences that represent episodes in the life of the robot. We claim that VOTING-EXPERTS finds meaningful episodes in categorical time series because it exploits two statistical characteristics of meaningful episodes.
Keywords :
data mining; document handling; entropy; time series; unsupervised learning; VOTING-EXPERTS algorithm; boundary entropy; categorical time series; episodes; expert methods; frequency; languages; ngrams; robot sensor data; statistics; subsequences; text segmentation; unsupervised segmentation algorithm; words; Computer science; DNA; Educational institutions; Frequency; Inference algorithms; Mathematics; Robot sensing systems; Sensor phenomena and characterization; Sequences; Writing;
Conference_Titel :
Data Mining, 2002. ICDM 2003. Proceedings. 2002 IEEE International Conference on
Print_ISBN :
0-7695-1754-4
DOI :
10.1109/ICDM.2002.1183891