Title :
Parametric classification over multiple samples
Author_Institution :
Fac. of Comput. Sci., Free Univ. of Bozen-Bolzano, Bolzano, Italy
Abstract :
This pattern was originally designed to classify sequences of events in log files by error-proneness. Sequences of events trace application use in real contexts. As such, identifying error-prone sequences helps understand and predict application use. The classification problem we describe is typical in supervised machine learning, but the composite pattern we propose investigates it with several techniques to control for data brittleness. Data pre-processing, feature selection, parametric classification, and cross-validation are the major instruments that enable a good degree of control over this classification problem. In particular, the pattern includes a solution for typical problems that occurs when data comes from several samples of different populations and with different degree of sparcity.
Keywords :
learning (artificial intelligence); pattern classification; classification problem; cross-validation; data pre-processing; error-prone sequences; feature selection; parametric classification; supervised machine learning; Accuracy; Correlation; Sociology; Software; Training; Vectors;
Conference_Titel :
Data Analysis Patterns in Software Engineering (DAPSE), 2013 1st International Workshop on
Conference_Location :
San Francisco, CA
DOI :
10.1109/DAPSE.2013.6603805