DocumentCode :
659543
Title :
Hash in a flash: Hash tables for flash devices
Author :
Clemons, Tyler ; Faisal, S.M. ; Tatikonda, Sekhar ; Aggarwal, Charu ; Parthasarathy, Srinivasan
Author_Institution :
Ohio State Univ., Columbus, OH, USA
fYear :
2013
fDate :
6-9 Oct. 2013
Firstpage :
7
Lastpage :
14
Abstract :
Conservative estimates place the amount of data expected to be created by mankind this year to exceed several thousand exabytes. Given the enormous data deluge, and in spite of recent advances in main memory capacities, there is a clear and present need to move beyond algorithms that assume in-core (main-memory) computation. One fundamental task in Information Retrieval and text analytics requires the maintenance of local and global term frequencies from within large enterprise document corpora. This can be done with a counting hash-table; they associate keys to frequencies. In this paper, we will study the design landscape for the development of such an out-of-core counting hash table targeted at flash storage devices. Flash devices have clear benefits over traditional hard drives in terms of latency of access and energy efficiency. However, due to intricacies in their design, random writes can be relatively expensive and can degrade the life of the flash device. Counting hash tables are a challenging case for the flash drive because this data structure is inherently dependent upon the randomness of the hash function; frequency updates are random and may incur random expensive random writes. We demonstrate how to overcome this challenge by designing a hash table with two related hash functions, one of which exhibits a data placement property with respect to the other. Specifically, we focus on three designs and evaluate the trade-offs among them along the axes of query performance, insert and update times, and I/O time using real-world data and an implementation of TF-IDF.
Keywords :
data structures; flash memories; TF-IDF; data deluge; data placement property; data structure; energy efficiency; enterprise document corpora; flash storage devices; global term frequencies maintenance; in-core main-memory computation; information retrieval; local term frequencies maintenance; memory capacities; out-of-core counting hash table; query performance; text analytics; Ash; Context; Encyclopedias; Internet; Performance evaluation; Random access memory;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Big Data, 2013 IEEE International Conference on
Conference_Location :
Silicon Valley, CA
Type :
conf
DOI :
10.1109/BigData.2013.6691692
Filename :
6691692
Link To Document :
بازگشت