DocumentCode :
1863473
Title :
High-Performance Distributed Indexing and Retrieval for Large Volume Traffic Log Datasets on the Cloud
Author :
Wen Yang ; Yinan Dou
Author_Institution :
Beijing Key Lab. of Network Syst. Archit. & Convergence, Beijing Univ. of Posts & Telecommun., Beijing, China
Volume :
1
fYear :
2013
fDate :
26-27 Aug. 2013
Firstpage :
185
Lastpage :
189
Abstract :
In this paper, we present a high-performance distributed system for storage, indexing and retrieval for large volume web traffic log datasets. This system is Based on the open source Map Reduce framework Hadoop and extends the functionality of Hadoop. We mainly focus on three noteworthy aspects of our work: the approach of large datasets storage on the Hadoop Distributed File System (HDFS), the appropriate indexing algorithm for large distributed datasets, a distributed retrieval architecture built on Hadoop. It has been proved that our system is efficient and the query response latency approach real time compared with HBase, a distributed, sparse, NoSQL database.
Keywords :
cloud computing; database indexing; distributed databases; query processing; HDFS; Hadoop distributed file system; cloud computing; distributed retrieval architecture; high-performance distributed indexing system; high-performance distributed retrieval system; high-performance distributed storage system; large volume web traffic log datasets; open source MapReduce framework; query response latency approach; Arrays; Distributed databases; IP networks; Indexing; Time factors; Web servers; distributed indexing; mapReduce; retrieval; traffic log;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligent Human-Machine Systems and Cybernetics (IHMSC), 2013 5th International Conference on
Conference_Location :
Hangzhou
Print_ISBN :
978-0-7695-5011-4
Type :
conf
DOI :
10.1109/IHMSC.2013.51
Filename :
6643863
Link To Document :
بازگشت