DocumentCode :
140893
Title :
Subgraph pattern matching over uncertain graphs with identity linkage uncertainty
Author :
Moustafa, Walaa Eldin ; Kimmig, Angelika ; Deshpande, A. ; Getoor, Lise
Author_Institution :
Dept. of Comput. Sci., Univ. of Maryland, College Park, MD, USA
fYear :
2014
fDate :
March 31 2014-April 4 2014
Firstpage :
904
Lastpage :
915
Abstract :
There is a growing need for methods that can represent and query uncertain graphs. These uncertain graphs are often the result of an information extraction and integration system that attempts to extract an entity graph or a knowledge graph from multiple unstructured sources [25], [7]. Such an integration typically leads to identity uncertainty, as different data sources may use different references to the same underlying real-world entities. Integration usually also introduces additional uncertainty on node attributes and edge existence. In this paper, we propose the notion of a probabilistic entity graph (PEG), a formal model that uniformly and systematically addresses these three types of uncertainty. A PEG is a probabilistic graph model that defines a distribution over possible graphs at the entity level. We introduce a general framework for constructing a PEG given uncertain data at the reference level and develop efficient algorithms to answer subgraph pattern matching queries in this setting. Our algorithms are based on two novel ideas: context-aware path indexing and reduction by join-candidates, which drastically reduce the query search space. A comprehensive experimental evaluation shows that our approach outperforms baseline implementations by orders of magnitude.
Keywords :
database indexing; graph theory; pattern matching; probability; query processing; ubiquitous computing; PEG; context-aware path indexing; entity graph extraction; formal probabilistic graph model; identity linkage uncertainty; information extraction system; information integration system; knowledge graph extraction; probabilistic entity graph; query search space reduction; reduction by join-candidates; subgraph pattern matching queries; uncertain graphs; Equations; Markov random fields; Pattern matching; Probabilistic logic; Probability distribution; Random variables; Uncertainty;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering (ICDE), 2014 IEEE 30th International Conference on
Conference_Location :
Chicago, IL
Type :
conf
DOI :
10.1109/ICDE.2014.6816710
Filename :
6816710
Link To Document :
بازگشت