DocumentCode :
3717188
Title :
CINTIA: A distributed, low-latency index for big interval data
Author :
Ruslan Mavlyutov;Philippe Cudre-Mauroux
Author_Institution :
eXascale Infolab, U. of Fribourg-Switzerland
fYear :
2015
Firstpage :
619
Lastpage :
628
Abstract :
Intervals have become prominent in data management as they are the main data structure to represent a number of key data types such as temporal or genomic data. Yet, there exists no solution to compactly store and efficiently query big interval data. In this paper we introduce CINTIA - the Checkpoint INTerval Index Array - an efficient data structure to store and query interval data, which achieves high memory locality and outperforms state-of-the art solutions. We also propose a low-latency, Big Data system that implements CINTIA on top of a popular distributed file system and efficiently manages large interval data on clusters of commodity machines. Our system can easily be scaled-out and was designed to accommodate large delays between the various components of a distributed infrastructure. We experimentally evaluate the performance of our approach on several datasets and show that it outperforms current solutions by several orders of magnitude in distributed settings.
Keywords :
"Arrays","Indexes","Distributed databases","Complexity theory","Bioinformatics","Genomics"
Publisher :
ieee
Conference_Titel :
Big Data (Big Data), 2015 IEEE International Conference on
Type :
conf
DOI :
10.1109/BigData.2015.7363806
Filename :
7363806
Link To Document :
بازگشت