مرکز منطقه ای اطلاع رساني علوم و فناوري - Avoiding Bias in Classification Accuracy

Abstract :

The amount of studies on classification of human characteristics based on measured individual signals has increased rapidly. In wearable sensors based activity recognition a common policy is to report human independent recognition results using leave-one-person-out cross-validation scheme. This can be a suitable solution when feature or model parameter selection is not needed or it is done outside the validation scheme. Unfortunately, this is not always the reality. Thus in this article it is studied how the train-validate-test approach changes the recognition rates compared to basic leave-one-out cross-validation approach. Results of three different ways to perform the train-validate-test is presented: 1) single division to training and testing data, 2) 10-fold division to training and testing data, and 3) double leave-one-person-out cross-validation. In this article, it is shown that the best classifier or feature set selected based on the training and validation data using basic leave-one-out approach does not always perform best within independent testing data. Nevertheless, a larger bias to results can be achieved using single division or even 10-fold division into separate training and testing data. Thus it is stated that the double leave-one-person-out is the most robust version for reporting classification rates in future studies of activity recognition as well as other areas where human signals are used.