DocumentCode :
3582738
Title :
Efficient information retrieval using Lucene, LIndex and HIndex in Hadoop
Author :
Mathew, Anita Brigit ; Pattnaik, Priyabrat ; Madhu Kumar, S.D.
fYear :
2014
Firstpage :
333
Lastpage :
340
Abstract :
The growth of unstructured and partially-structured data in biological networks, social media, geographical information and other web-based applications present an open challenge to the cloud database community. Hence, the approach to exhaustive BigData analysis that integrates structured and unstructured data processing have become increasingly critical in today´s world. MapReduce, has recently emerged as a popular framework for extensive data analytics. Use of powerful indexing techniques would allow users to significantly speed up query processing among MapReduce jobs. Currently, there are a number of indexing techniques like Hadoop++, HAIL, LIAH, Adaptive Indexing etc., but none of them provide an optimized technique for text based selection operations. This paper proposes two indexing approaches in HDFS, namely LIndex and HIndex. These indexing approaches are found to carefully perform selection operation better compared to existing Lucene index approach. A fast retrieval technique is suggested in the MapReduce framework with the new LIndex and HIndex approaches. LIndex provides a complete-text index and it informs the Hadoop implementation engine to scan only those data blocks which contain the terms of interest. LIndex also enhances the throughput (minimizes response time) and overcome some of the drawbacks like upfront cost and long idle time for index creation. This gave a better performance than Lucene but lacked in response and computation time. Hence a new index named HIndex is suggested. This scheme is found to perform better than LIndex in response and computation time.
Keywords :
Big Data; indexing; parallel processing; query processing; text analysis; Big Data analysis; HAIL; HDFS; HIndex; Hadoop implementation engine; Hadoop++; LIAH; LIndex; Lucene index approach; MapReduce framework; Web-based application; adaptive indexing; biological network; cloud database community; computation time; data analytics; data block; geographical information; indexing technique; information retrieval; query processing; retrieval technique; social media; text based selection operation; unstructured data processing; Ecosystems; Engines; Indexing; Time factors; Trojan horses; Complete-text indexing; HIndex; Hadoop; LIndex; Lucene; MapReduce;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Systems and Applications (AICCSA), 2014 IEEE/ACS 11th International Conference on
Type :
conf
DOI :
10.1109/AICCSA.2014.7073217
Filename :
7073217
Link To Document :
بازگشت