Title :
Hydrological Time Series Anomaly Mining Based on Symbolization and Distance Measure
Author :
Dingsheng Wan ; Yan Xiao ; Pengcheng Zhang ; Jun Feng ; Yuelong Zhu ; Qian Liu
Author_Institution :
Coll. of Comput. & Inf., Hohai Univ., Nanjing, China
fDate :
June 27 2014-July 2 2014
Abstract :
Large amount of hydrological data set is a kind of big data, which has much hidden and potentially useful knowledge. It is necessary to extract these knowledge from hydrological data set, which can provide more valuable hydrological information and be useful for future hydrological forecasting. Data mining based on time series is widely used currently. There are some techniques based on time series to extract anomaly. However, most of these techniques cannot suit big unstable data such as hydrological big data set. Some important problems are high fitting error after dimension reduction and low accuracy of mining results. In this work we propose a new idea to solve the problem of hydrological anomaly mining based on time series. The idea combines time series symbolization with distance measure. It proposes Feature Points Symbolic Aggregate Approximation (FP SAX) to improve the selection of feature points, and then measures the distance of strings by Symbol Distance based Dynamic Time Warping (SD DTW). Finally, the distance which we have got are sorted. A set of dedicated experiments are performed to validate our approach. The experimental data set is based on the water level data set obtained from Xiaomeikou gauge station in the Taihu Lake from 1956 to 2005. The results of experiments show that our approach has lower fitting error and higher accuracy.
Keywords :
Big Data; data mining; geophysics computing; hydrology; time series; FP_SAX; SD_DTW; Taihu Lake; Xiaomeikou gauge station; big data; distance measure; feature point selection; feature points symbolic aggregate approximation; hydrological time series anomaly mining; symbol distance based dynamic time warping; time series symbolization; water level data set; Accuracy; Big data; Data compression; Data mining; Euclidean distance; Time measurement; Time series analysis; Data Mining; Distance Measure; Hydrological Time Series; Pattern Representation;
Conference_Titel :
Big Data (BigData Congress), 2014 IEEE International Congress on
Conference_Location :
Anchorage, AK
Print_ISBN :
978-1-4799-5056-0
DOI :
10.1109/BigData.Congress.2014.56