DocumentCode
671702
Title
Dealing with highly imbalanced textual data gathered into similar classes
Author
Lamirel, Jean-Charles
Author_Institution
Synalp Team, LORIA, Nancy, France
fYear
2013
fDate
4-9 Aug. 2013
Firstpage
1
Lastpage
7
Abstract
This paper deals with a new feature selection and feature contrasting approach for classification of highly imbalanced textual data with a high degree of similarity between associated classes. An example of such classification context is illustrated by the task of classifying bibliographic references into a patent classification scheme. This task represents one of the domains of investigation of the QUAERO project, with the final goal of helping experts to evaluate upcoming patents through the use of related research.
Keywords
feature selection; learning (artificial intelligence); patents; pattern classification; text analysis; QUAERO project; bibliographic reference classification; degree of similarity; feature contrasting approach; feature selection; highly imbalanced textual data; patent classification scheme; Accuracy; Context; Feature extraction; Labeling; Measurement; Patents; Principal component analysis;
fLanguage
English
Publisher
ieee
Conference_Titel
Neural Networks (IJCNN), The 2013 International Joint Conference on
Conference_Location
Dallas, TX
ISSN
2161-4393
Print_ISBN
978-1-4673-6128-6
Type
conf
DOI
10.1109/IJCNN.2013.6707044
Filename
6707044
Link To Document