Title :
A Cooperative Co-learning Approach for Concept Detection in Documents
Author :
Li, Jianqiang ; Liu, Chunchen
Author_Institution :
NEC Labs. China, Beijing, China
Abstract :
The learning-based approaches play a dominant role for detecting concept instances in documents, which mainly utilize the fully labeled documents (i.e., all the contained concepts are labeled) as training data to build the concept instance recognizer. However, in many cases, the available training data is sparsely labeled (only a part of the contained concepts are labeled), which makes the existing learning based approaches are not applicable. To address this issue, this paper proposed a novel co-learning approach for high accurate concept instance detection in documents. The large pool of sparsely labeled dataset is split into multiple subsets. Then multiple sequence learning models are trained on these different subsets in an iterative way, where the mechanisms of ensemble learning and co-training are embedded. The empirical experiments show that, our approach outperforms the best baselines 10% in terms of F1 measure and spend much less running time, which demonstrates the effectiveness of the proposed approach.
Keywords :
document handling; learning (artificial intelligence); F1 measure; concept instance detection; concept instance recognizer; cooperative co-learning approach; ensemble co-training mechanism; ensemble learning mechanism; fully labeled documents; multiple sequence learning models; sparsely labeled dataset; training data; Hidden Markov models; Joining processes; Knowledge based systems; Labeling; Training; Training data; Vehicles; Named entity reorganization; co-training; concept identification; ensemble learning; entity linking;
Conference_Titel :
Semantic Computing (ICSC), 2012 IEEE Sixth International Conference on
Conference_Location :
Palermo
Print_ISBN :
978-1-4673-4433-3
DOI :
10.1109/ICSC.2012.32