Title :
CINTIA: A distributed, low-latency index for big interval data
Author :
Ruslan Mavlyutov;Philippe Cudre-Mauroux
Author_Institution :
eXascale Infolab, U. of Fribourg-Switzerland
Abstract :
Intervals have become prominent in data management as they are the main data structure to represent a number of key data types such as temporal or genomic data. Yet, there exists no solution to compactly store and efficiently query big interval data. In this paper we introduce CINTIA - the Checkpoint INTerval Index Array - an efficient data structure to store and query interval data, which achieves high memory locality and outperforms state-of-the art solutions. We also propose a low-latency, Big Data system that implements CINTIA on top of a popular distributed file system and efficiently manages large interval data on clusters of commodity machines. Our system can easily be scaled-out and was designed to accommodate large delays between the various components of a distributed infrastructure. We experimentally evaluate the performance of our approach on several datasets and show that it outperforms current solutions by several orders of magnitude in distributed settings.
Keywords :
"Arrays","Indexes","Distributed databases","Complexity theory","Bioinformatics","Genomics"
Conference_Titel :
Big Data (Big Data), 2015 IEEE International Conference on
DOI :
10.1109/BigData.2015.7363806