مرکز منطقه ای اطلاع رساني علوم و فناوري - MapReduce Implementation of XML Keyword Search Algorithm

Abstract :

Keyword search for smallest lowest common ancestors (SLCAs) is an important approach to identify interesting data nodes in XML documents. With the rapid growth of XML data in Internet, how to effectively process massive XML data becomes an interesting topic. As an open-source cloud computing platform developed in recent years, Hadoop is a trend to process large-scale data, which makes possible massive storage and efficient search of XML data. In this paper, we first present two properties to improve the classical ILE algorithm. Then, a kind of parallel XML keyword search algorithm is proposed and realized on a MapReduce programming model. Two experiments on 4 datasets of different sizes in cluster are performed. The results show that our proposed algorithm is applicable to keyword search of massive XML data.