DocumentCode :
659636
Title :
Provenance comparison for large-scale knowledge discovery
Author :
Xiang Zhao ; Bin Ge ; Jiuyang Tang ; Weidong Xiao ; Haichuan Shang
Author_Institution :
Nat. Univ. of Defense Technol., Changsha, China
fYear :
2013
fDate :
6-9 Oct. 2013
Firstpage :
68
Lastpage :
75
Abstract :
Provenance is a record that describes entities and processes involved in producing, delivering and influencing a resource. Provenance management and reuse can enable interesting applications for knowledge discovery and analytics. One crucial component of a provenance management system is the comparison between provenances. In the era of big data, provenance management systems are in need of a scalable algorithmic solution for efficient comparison. Existing solutions to the problem have large memory footprint and require overlong system response time. In this paper, we present a new solution to threshold-based provenance comparison. We model provenance directly as graph, and propose to measure provenance similarity using provenance edit distance. Following the depth-first search paradigm, we design an algorithm PEDSim based on an encoding technique specific to provenance graphs and quantifiable heuristics. Extensive experiments on real data demonstrate the superiority of our method to other alternatives.
Keywords :
Big Data; data analysis; data mining; graph theory; tree searching; PEDSim algorithm; big data; data analytics; depth-first search paradigm; encoding technique; large-scale knowledge discovery; memory footprint; provenance edit distance; provenance management; provenance reuse; scalable algorithmic solution; system response time; threshold-based provenance comparison; Algorithm design and analysis; Encoding; Heuristic algorithms; Information management; Knowledge discovery; Memory management; Space exploration;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Big Data, 2013 IEEE International Conference on
Conference_Location :
Silicon Valley, CA
Type :
conf
DOI :
10.1109/BigData.2013.6691785
Filename :
6691785
Link To Document :
بازگشت