• DocumentCode
    671568
  • Title

    Active learning in the real-world design and analysis of the Nomao challenge

  • Author

    Candillier, Laurent ; Lemaire, Vincent

  • Author_Institution
    Nomao - Ebuzzing Group, Toulouse, France
  • fYear
    2013
  • fDate
    4-9 Aug. 2013
  • Firstpage
    1
  • Lastpage
    8
  • Abstract
    Active Learning is an active area of research in the Machine Learning and Data Mining communities. In parallel, needs for efficient active learning methods are raised in real-world applications. As an illustration, we present in this paper an active learning challenge applied to a real-world application named Nomao. Nomao is a search engine of places. It aggregates information coming from multiple sources on the web to propose complete information related to a place. In this context, active learning is used to efficiently detect data that refer to a same place. The process is called data deduplication. Since it is a real-world application, some additional constraints have to be handled. The main ones are scalability of the proposed method, representativeness of the training dataset, and practicality of the labeling process. The website of the challenge remains open beyond the termination of the challenge as a resource for students and researchers (http://www.nomao.com/labs/challenge) and to share that problem with the community, the whole labeled dataset has been delivered publicly to the UCI Machine Learning Repository http://archive.ics.uci.edu/ml/datasets/Nomao)
  • Keywords
    Internet; learning (artificial intelligence); search engines; Nomao; UCI Machine Learning Repository; active learning; data deduplication; data detection; information aggregation; place search engine; real-world design; Adaptation models; Boosting; Data models; Labeling; Space exploration; Training;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Neural Networks (IJCNN), The 2013 International Joint Conference on
  • Conference_Location
    Dallas, TX
  • ISSN
    2161-4393
  • Print_ISBN
    978-1-4673-6128-6
  • Type

    conf

  • DOI
    10.1109/IJCNN.2013.6706908
  • Filename
    6706908