DocumentCode :
3036428
Title :
Serializing RDF in Compressed Space
Author :
Hernandez-Illera, Antonio ; Martinez-Prieto, Miguel A. ; Fernandez, Javier D.
Author_Institution :
Dept. of Comput. Sci., Univ. de Valladolid, Valladolid, Spain
fYear :
2015
fDate :
7-9 April 2015
Firstpage :
363
Lastpage :
372
Abstract :
The amount of generated RDF data has grown impressively over the last decade, promoting compression as an essential tool for storage and exchange. RDF compression techniques leverage syntactic and semantic redundancies, but structural repetitions are not always addressed effectively. This paper first shows two schema-based sources of redundancy underlying to the schema-relaxed nature of RDF. Then, we revisit the W3C HDT binary format to further compact its graph structure encoding. Our HDT++ approach reduces the original HDT Triples requirements up to 2 times for more structured datasets, and reports significant improvements even for highly semi-structured datasets like DBpedia. In general, HDT++ competes with the current state of the art for structural RDF compression, leading the comparison for three of the four analyzed datasets.
Keywords :
Internet; Web sites; data compression; data structures; graph theory; redundancy; DBpedia; HDT++ approach; RDF compression technique; W3C HDT binary format; compressed space; generated RDF data; graph structure encoding; schema-based source; schema-relaxed nature; semantic redundancy; semistructured dataset; syntactic redundancy; Dictionaries; Encoding; Redundancy; Resource description framework; Semantics; Syntactics; Vegetation; HDT; RDF compression; semantic web;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Compression Conference (DCC), 2015
Conference_Location :
Snowbird, UT
ISSN :
1068-0314
Type :
conf
DOI :
10.1109/DCC.2015.16
Filename :
7149293
Link To Document :
بازگشت