DocumentCode :
65081
Title :
Integrated Oversampling for Imbalanced Time Series Classification
Author :
Hong Cao ; Xiao-Li Li ; Woon, David Yew-Kwong ; See-Kiong Ng
Volume :
25
Issue :
12
fYear :
2013
fDate :
Dec. 2013
Firstpage :
2809
Lastpage :
2822
Abstract :
This paper proposes a novel Integrated Oversampling (INOS) method that can handle highly imbalanced time series classification. We introduce an enhanced structure preserving oversampling (ESPO) technique and synergistically combine it with interpolation-based oversampling. ESPO is used to generate a large percentage of the synthetic minority samples based on multivariate Gaussian distribution, by estimating the covariance structure of the minority-class samples and by regularizing the unreliable eigen spectrum. To protect the key original minority samples, we use an interpolation-based technique to oversample a small percentage of synthetic population. By preserving the main covariance structure and intelligently creating protective variances in the trivial eigen dimensions, ESPO effectively expands the synthetic samples into the void area in the data space without being too closely tied with existing minority-class samples. This also addresses a key challenge for applying oversampling for imbalanced time series classification, i.e., maintaining the correlation between consecutive values through preserving the main covariance structure. Extensive experiments based on seven public time series data sets demonstrate that our INOS approach, used with support vector machines (SVM), achieved better performance over existing oversampling methods as well as state-of-the-art methods in time series classification.
Keywords :
Gaussian distribution; data mining; interpolation; learning (artificial intelligence); pattern classification; sampling methods; support vector machines; time series; ESPO; INOS; SVM; covariance structure estimation; data mining; data space; eigen spectrum; enhanced structure preserving oversampling technique; imbalanced time series classification; integrated oversampling method; interpolation-based oversampling; machine learning; minority-class samples; multivariate Gaussian distribution; support vector machines; time series data sets; Covariance matrix; Eigenvalues and eigenfunctions; Gaussian distribution; Null space; Sampling methods; Support vector machines; Time series analysis; Oversampling; SVM; classification; imbalanced data; learning; structure preserving; time series;
fLanguage :
English
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
1041-4347
Type :
jour
DOI :
10.1109/TKDE.2013.37
Filename :
6468038
Link To Document :
بازگشت