• DocumentCode
    3227662
  • Title

    Classifying Documents within Multiple Hierarchical Datasets Using Multi-task Learning

  • Author

    Naik, Anima ; Charuvaka, Anveshi ; Rangwala, Huzefa

  • Author_Institution
    Dept. of Comput. Sci., George Mason Univ., Fairfax, VA, USA
  • fYear
    2013
  • fDate
    4-6 Nov. 2013
  • Firstpage
    390
  • Lastpage
    397
  • Abstract
    Multi-task learning (MTL) is a supervised learning paradigm in which the prediction models for several related tasks are learned jointly to achieve better generalization performance. When there are only a few training examples per task, MTL considerably outperforms the traditional Single task learning (STL) in terms of prediction accuracy. In this work we develop an MTL based approach for classifying documents that are archived within dual concept hierarchies, namely, DMOZ and Wikipedia. We solve the multi-class classification problem by defining one-versus-rest binary classification tasks for each of the different classes across the two hierarchical datasets. Instead of learning a linear discriminant for each of the different tasks independently, we use a MTL approach with relationships between the different tasks across the datasets established using the non-parametric, lazy, nearest neighbor approach. We also develop and evaluate a transfer learning (TL) approach and compare the MTL (and TL) methods against the standard single task learning and semi-supervised learning approaches. Our empirical results demonstrate the strength of our developed methods that show an improvement especially when there are fewer number of training examples per classification task.
  • Keywords
    document handling; generalisation (artificial intelligence); learning (artificial intelligence); nonparametric statistics; pattern classification; DMOZ; MTL based approach; Wikipedia; document classification; dual concept hierarchies; generalization performance; linear discriminant learning; multiclass classification problem; multiple hierarchical datasets; multitask learning; nonparametric lazy nearest neighbor approach. transfer learning approach; one-versus-rest binary classification task; prediction model; semisupervised learning; single task learning comparison; task relationships; Electronic publishing; Encyclopedias; Internet; Semisupervised learning; Training; Vectors; multi-task learning; semi-supervised learning; text classification; transfer learning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Tools with Artificial Intelligence (ICTAI), 2013 IEEE 25th International Conference on
  • Conference_Location
    Herndon, VA
  • ISSN
    1082-3409
  • Print_ISBN
    978-1-4799-2971-9
  • Type

    conf

  • DOI
    10.1109/ICTAI.2013.65
  • Filename
    6735276