Title :
Learning to Refine an Automatically Extracted Knowledge Base Using Markov Logic
Author :
Shangpu Jiang ; Lowd, D. ; Dejing Dou
Author_Institution :
Dept. of Comput. & Inf. Sci., Univ. of Oregon, Eugene, OR, USA
Abstract :
A number of text mining and information extraction projects such as Text Runner and NELL seek to automatically build knowledge bases from the rapidly growing amount of information on the web. In order to scale to the size of the web, these projects often employ ad hoc heuristics to reason about uncertain and contradictory information rather than reasoning jointly about all candidate facts. In this paper, we present a Markov logic-based system for cleaning an extracted knowledge base. This allows a scalable system such as NELL to take advantage of joint probabilistic inference, or, conversely, allows Markov logic to be applied to a web scale problem. Our system uses only the ontological constraints and confidence values of the original system, along with human-labeled data if available. The labeled data can be used to calibrate the confidence scores from the original system or learn the effectiveness of individual extraction patterns. To achieve scalability, we introduce a neighborhood grounding method that only instantiates the part of the network most relevant to the given query. This allows us to partition the knowledge cleaning task into tractable pieces that can be solved individually. In experiments on NELL´s knowledge base, we evaluate several variants of our approach and find that they improve both F1 and area under the precision-recall curve.
Keywords :
Internet; Markov processes; data mining; formal logic; inference mechanisms; knowledge based systems; ontologies (artificial intelligence); query processing; text analysis; Markov logic-based system; NELL; Web information; Web scale problem; ad hoc heuristics; confidence score; confidence value; extraction pattern; human-labeled data; information extraction; knowledge base extraction; knowledge cleaning task; neighborhood grounding method; ontological constraint; precision-recall curve; probabilistic inference; query; text mining; text runner; Data mining; Joints; Knowledge based systems; Logistics; Markov processes; Ontologies; Training data; Information extraction; Markov logic; knowledge base; ontology; text mining;
Conference_Titel :
Data Mining (ICDM), 2012 IEEE 12th International Conference on
Conference_Location :
Brussels
Print_ISBN :
978-1-4673-4649-8
DOI :
10.1109/ICDM.2012.156