Title :
Profiling and mining RDF data with ProLOD++
Author :
Abedjan, Ziawasch ; Gruetze, Toni ; Jentzsch, Anja ; Naumann, Felix
Author_Institution :
Hasso Plattner Inst. (HPI), Potsdam, Germany
fDate :
March 31 2014-April 4 2014
Abstract :
Before reaping the benefits of open data to add value to an organizations internal data, such new, external datasets must be analyzed and understood already at the basic level of data types, constraints, value patterns etc. Such data profiling, already difficult for large relational data sources, is even more challenging for RDF datasets, the preferred data model for linked open data. We present ProLod++, a novel tool for various profiling and mining tasks to understand and ultimately improve open RDF data. ProLod++ comprises various traditional data profiling tasks, adapted to the RDF data model. In addition, it features many specific profiling results for open data, such as schema discovery for user-generated attributes, association rule discovery to uncover synonymous predicates, and uniqueness discovery along ontology hierarchies. ProLod++ is highly efficient, allowing interactive profiling for users interested in exploring the properties and structure of yet unknown datasets.
Keywords :
data analysis; data mining; data models; ProLOD++; RDF data mining; RDF data model; RDF data profiling; association rule discovery; interactive profiling; ontology hierarchies; open RDF data; schema discovery; synonymous predicates; uniqueness discovery; user-generated attributes; Association rules; Data models; Data visualization; Ontologies; Pattern analysis; Resource description framework;
Conference_Titel :
Data Engineering (ICDE), 2014 IEEE 30th International Conference on
Conference_Location :
Chicago, IL
DOI :
10.1109/ICDE.2014.6816740