Title :
Dealing with highly imbalanced textual data gathered into similar classes
Author :
Lamirel, Jean-Charles
Author_Institution :
Synalp Team, LORIA, Nancy, France
Abstract :
This paper deals with a new feature selection and feature contrasting approach for classification of highly imbalanced textual data with a high degree of similarity between associated classes. An example of such classification context is illustrated by the task of classifying bibliographic references into a patent classification scheme. This task represents one of the domains of investigation of the QUAERO project, with the final goal of helping experts to evaluate upcoming patents through the use of related research.
Keywords :
feature selection; learning (artificial intelligence); patents; pattern classification; text analysis; QUAERO project; bibliographic reference classification; degree of similarity; feature contrasting approach; feature selection; highly imbalanced textual data; patent classification scheme; Accuracy; Context; Feature extraction; Labeling; Measurement; Patents; Principal component analysis;
Conference_Titel :
Neural Networks (IJCNN), The 2013 International Joint Conference on
Conference_Location :
Dallas, TX
Print_ISBN :
978-1-4673-6128-6
DOI :
10.1109/IJCNN.2013.6707044