DocumentCode :
1917322
Title :
Indexing Sequences of IEEE 754 Double Precision Numbers
Author :
Fariña, Antonio ; Ordóñez, Alberto ; Parama, J.R.
Author_Institution :
Database Lab., Univ. of A Coruna, A Coruña, Spain
fYear :
2012
fDate :
10-12 April 2012
Firstpage :
367
Lastpage :
376
Abstract :
In the last decades, much attention has been paid to the development of succinct data structures to store and/or index text, biological collections, source code, etc. Their success was in most cases due to handling data with a relatively small alphabet size and to typically exploit a rather skewed distribution (text) or simply the repetitiveness within the source data (source code repositories, biological sequences of similar individuals). In this work, we face the problem of dealing with collections of floating point data that typically have a large alphabet (a real number hardly ever repeats twice) and a less biased distribution. We present two solutions to store and index such collections. The first one is based on the well-known inverted index. It consumes space around the size of the original collection, providing appealing search times. The second one uses a wavelet tree, which at the expense of slower search times, obtains slightly better space consumption.
Keywords :
data structures; storage management; IEEE 754 double precision numbers; biological collections; biological sequences; floating point data; indexing sequences; inverted index; skewed distribution; source code repositories; succinct data structures; Arrays; Compressors; Electricity; Entropy; Indexing; Production;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Compression Conference (DCC), 2012
Conference_Location :
Snowbird, UT
ISSN :
1068-0314
Print_ISBN :
978-1-4673-0715-4
Type :
conf
DOI :
10.1109/DCC.2012.43
Filename :
6189268
Link To Document :
بازگشت