مرکز منطقه ای اطلاع رساني علوم و فناوري - A Parsimonious Mixture of Gaussian Trees Model for Oversampling in Imbalanced and Multimodal Time-Series Classification

DocumentCode :

32200

Title :

A Parsimonious Mixture of Gaussian Trees Model for Oversampling in Imbalanced and Multimodal Time-Series Classification

Author :

Hong Cao ; Tan, Vincent Y. F. ; Pang, John Z. F.

Author_Institution :

Inst. for Infocomm Res., Agency for Sci. Technol. & Res., Singapore, Singapore

Volume :

Issue :

fYear :

2014

fDate :

Dec. 2014

Firstpage :

2226

Lastpage :

2239

Abstract :

We propose a novel framework of using a parsimonious statistical model, known as mixture of Gaussian trees, for modeling the possibly multimodal minority class to solve the problem of imbalanced time-series classification. By exploiting the fact that close-by time points are highly correlated due to smoothness of the time-series, our model significantly reduces the number of covariance parameters to be estimated from O(d²) to O(Ld), where L is the number of mixture components and d is the dimensionality. Thus, our model is particularly effective for modeling high-dimensional time-series with limited number of instances in the minority positive class. In addition, the computational complexity for learning the model is only of the order O(Ln+d²) where n+ is the number of positively labeled samples. We conduct extensive classification experiments based on several well-known time-series data sets (both singleand multimodal) by first randomly generating synthetic instances from our learned mixture model to correct the imbalance. We then compare our results with several state-of-the-art oversampling techniques and the results demonstrate that when our proposed model is used in oversampling, the same support vector machines classifier achieves much better classification accuracy across the range of data sets. In fact, the proposed method achieves the best average performance 30 times out of 36 multimodal data sets according to the F-value metric. Our results are also highly competitive compared with nonoversampling-based classifiers for dealing with imbalanced time-series data sets.

Keywords :

Gaussian processes; computational complexity; learning (artificial intelligence); pattern classification; sampling methods; time series; trees (mathematics); F-value metric; Gaussian trees model; classification accuracy; classification experiments; computational complexity; covariance parameters; imbalanced time-series classification; multimodal minority class; multimodal time-series classification; oversampling; parsimonious statistical model; support vector machines classifier; Computational modeling; Correlation; Covariance matrices; Data models; Graphical models; Markov processes; Random variables; Gaussian graphical models; imbalanced data set; mixture models; multimodality; oversampling; time-series; time-series.;

fLanguage :

English

Journal_Title :

Neural Networks and Learning Systems, IEEE Transactions on

Publisher :

ieee

ISSN :

2162-237X

Type :

jour

DOI :

10.1109/TNNLS.2014.2308321

Filename :

6766252

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=32200