Author_Institution :
Dept. of Electr. & Electron. Eng., Univ. of Cagliari, Cagliari, Italy
Abstract :
The performance of a classification system depends on various aspects, including encoding techniques. In fact, encoding techniques play a primary role in the process of tuning a classifier/predictor, as choosing the most appropriate encoder may greatly affect its performance. As of now, evaluating the impact of an encoding technique on a classification system typically requires to train the system and test it by means of a performance metric deemed relevant (e.g., precision, recall, and Matthews correlation coefficients). For this reason, assessing a single encoding technique is a time consuming activity, which introduces some additional degrees of freedom (e.g., parameters of the training algorithm) that may be uncorrelated with the encoding technique to be assessed. In this paper, we propose a family of methods to measure the performance of encoding techniques used in classification tasks, based on the correlation between encoded input data and the corresponding output. The proposed approach provides correlation-based metrics, devised with the primary goal of focusing on the encoding technique, leading other unrelated aspects apart. Notably, the proposed technique allows to save computational time to a great extent, as it needs only a tiny fraction of the time required by standard methods.
Keywords :
correlation theory; encoding; learning (artificial intelligence); matrix algebra; pattern classification; classification system; classifier-predictor tuning; correlation-based metrics; encoding technique assessment; learning algorithm; performance metric; protein secondary structure prediction; Correlation; Encoding; Measurement; Proteins; Random variables; Standards; Vectors; classification; correlation; encoding techniques; metrics; performance; prediction; supervised learning;