Title :
Maintaining imbalance highly dependent medical data using dirichlet process data generation
Author :
Antaresti, Tieta ; Fanany, Mohamad Ivan ; Arymurthy, Aniati Murni
Author_Institution :
Fac. of Comput. Sci., Pattern Recognition, Image Process. & Content-Based Image Retrieval Lab., Univ. Indonesia, Depok, Indonesia
Abstract :
The existence of imbalanced data between one class and another class is an important issue to be considered in a classification problem. One of the well-known data balancing technique is the artificial oversampling, which increase the size of datasets. In this research, multinomial classification was applied to classify some recorded features obtained from a single ECG (electrocardiograph) sensor. Therefore, a Dirichlet process, a dirichlet distribution of cumulative distribution function of each data partition, was needed to model the distribution of the new generated data by also considering the statistical properties of the previous data. Data balancing process had given the result of 77.21% classification accuracy (CA), and 90.9% area under ROC curve (AUC).
Keywords :
electrocardiography; feature extraction; medical administrative data processing; medical computing; pattern classification; statistical distributions; Dirichlet process data generation; artificial oversampling; classification problem; data balancing technique; highly dependent medical data maintenance; multinomial classification; single ECG sensor; Accuracy; Bayesian methods; Data models; Diseases; Electrocardiography; Machine learning; Sleep apnea;
Conference_Titel :
Digital Information Management (ICDIM), 2011 Sixth International Conference on
Conference_Location :
Melbourn, QLD
Print_ISBN :
978-1-4577-1538-9
DOI :
10.1109/ICDIM.2011.6093359