Title :
Source-Aware Partitioning for Robust Cross-Validation
Author :
Ozsel Kilinc;Ismail Uysal
Author_Institution :
Electr. Eng., Univ. of South Florida, Tampa, FL, USA
Abstract :
One of the most critical components of engineering a machine learning algorithm for a live application is robust performance assessment prior to its implementation. Cross-validation is used to forecast a specific algorithm´s classification or prediction accuracy on new input data given a finite dataset for training and testing the algorithm. Two most well known cross-validation techniques, random subsampling (RSS) and K-fold, are used to generalize the assessment results of machine learning algorithms in a non-exhaustive random manner. In this work we first show that for an inertia based activity recognition problem where data is collected from different users of a wrist-worn wireless accelerometer, random partitioning of the data, regardless of cross-validation technique, results in statistically similar average accuracies for a standard feed-forward neural network classifier. We propose a novel source-aware partitioning technique where samples from specific users are completely left out of the training/validation sets in rotation. The average error for the proposed cross-validation method is significantly higher with lower standard variation, which is a major indicator of cross-validation robustness. Approximately 30% increase in average error rate implies that source-aware cross validation could be a better indication of live algorithm performance where test data statistics would be significantly different than training data due to source (or user)-sensitive nature of process data.
Keywords :
"Training","Testing","Time-domain analysis","Algorithm design and analysis","Partitioning algorithms","Robustness","Feature extraction"
Conference_Titel :
Machine Learning and Applications (ICMLA), 2015 IEEE 14th International Conference on
DOI :
10.1109/ICMLA.2015.216