Title :
Active learning in the real-world design and analysis of the Nomao challenge
Author :
Candillier, Laurent ; Lemaire, Vincent
Author_Institution :
Nomao - Ebuzzing Group, Toulouse, France
Abstract :
Active Learning is an active area of research in the Machine Learning and Data Mining communities. In parallel, needs for efficient active learning methods are raised in real-world applications. As an illustration, we present in this paper an active learning challenge applied to a real-world application named Nomao. Nomao is a search engine of places. It aggregates information coming from multiple sources on the web to propose complete information related to a place. In this context, active learning is used to efficiently detect data that refer to a same place. The process is called data deduplication. Since it is a real-world application, some additional constraints have to be handled. The main ones are scalability of the proposed method, representativeness of the training dataset, and practicality of the labeling process. The website of the challenge remains open beyond the termination of the challenge as a resource for students and researchers (http://www.nomao.com/labs/challenge) and to share that problem with the community, the whole labeled dataset has been delivered publicly to the UCI Machine Learning Repository http://archive.ics.uci.edu/ml/datasets/Nomao)
Keywords :
Internet; learning (artificial intelligence); search engines; Nomao; UCI Machine Learning Repository; active learning; data deduplication; data detection; information aggregation; place search engine; real-world design; Adaptation models; Boosting; Data models; Labeling; Space exploration; Training;
Conference_Titel :
Neural Networks (IJCNN), The 2013 International Joint Conference on
Conference_Location :
Dallas, TX
Print_ISBN :
978-1-4673-6128-6
DOI :
10.1109/IJCNN.2013.6706908