DocumentCode :
260144
Title :
Improving textual data classification and discrimination using an ad-hoc metric: Application to a famous text discrimination challenge
Author :
Lamirel, Jean-Charles ; Cuxac, Pascal
Author_Institution :
Synalp, LORIA, Villers-lès-Nancy, France
fYear :
2014
fDate :
9-10 Nov. 2014
Firstpage :
1
Lastpage :
6
Abstract :
Labelling maximization (F-max) is an unbiased metric for estimation of the quality of non-supervised classification (clustering) that promotes the clusters with a maximum value of feature F-measure. In this paper, we show that an adaptation of this metric within the supervised classification allows to perform a selection of features and to calculate for each of them a function of contrast. The method is tested on the famous, difficult deemed and ill-balanced Mitterrand-Chirac talk´s dataset of DEFT 2005 challenge. We show that it produces extremely important classification performance improvements on this dataset while allowing to clearly isolate the discriminating characteristics of the different classes (i.e. Chirac and Mitterrand profiles).
Keywords :
feature selection; pattern classification; DEFT; F-max; F-measure; ad-hoc metric; discriminative analysis; feature selection; ill-balanced Mitterrand-Chirac data; imbalanced data; labelling maximization; nonsupervised classification; textual data classification; textual data discrimination; unbiased metric; Accuracy; Bayes methods; Classification algorithms; Context; Hidden Markov models; Measurement; Patents; classification; discriminative analysis; feature maximization; feature selection; imbalanced data;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
ISKO-Maghreb: Concepts and Tools for knowledge Management (ISKO-Maghreb), 2014 4th International Symposium
Conference_Location :
Algiers
Print_ISBN :
978-1-4799-7507-5
Type :
conf
DOI :
10.1109/ISKO-Maghreb.2014.7033480
Filename :
7033480
Link To Document :
بازگشت