• DocumentCode
    671702
  • Title

    Dealing with highly imbalanced textual data gathered into similar classes

  • Author

    Lamirel, Jean-Charles

  • Author_Institution
    Synalp Team, LORIA, Nancy, France
  • fYear
    2013
  • fDate
    4-9 Aug. 2013
  • Firstpage
    1
  • Lastpage
    7
  • Abstract
    This paper deals with a new feature selection and feature contrasting approach for classification of highly imbalanced textual data with a high degree of similarity between associated classes. An example of such classification context is illustrated by the task of classifying bibliographic references into a patent classification scheme. This task represents one of the domains of investigation of the QUAERO project, with the final goal of helping experts to evaluate upcoming patents through the use of related research.
  • Keywords
    feature selection; learning (artificial intelligence); patents; pattern classification; text analysis; QUAERO project; bibliographic reference classification; degree of similarity; feature contrasting approach; feature selection; highly imbalanced textual data; patent classification scheme; Accuracy; Context; Feature extraction; Labeling; Measurement; Patents; Principal component analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Neural Networks (IJCNN), The 2013 International Joint Conference on
  • Conference_Location
    Dallas, TX
  • ISSN
    2161-4393
  • Print_ISBN
    978-1-4673-6128-6
  • Type

    conf

  • DOI
    10.1109/IJCNN.2013.6707044
  • Filename
    6707044