Title :
Applying MapReduce algorithm to performance testing in lexical analysis on HDFS
Author :
Joldzic, Ognjen V.
Author_Institution :
Fac. of Electr. Eng., Univ. of Banja Luka, Banja Luka, Bosnia-Herzegovina
Abstract :
This paper presents an overview of distributed data processing technology, and explores the possibilities and advantages of using this technology in lexical analysis of Cyrillic text. A detailed overview of one of the most widely used framworks for processing large datasets - Apache Hadoop - is presented, along with a recommendation for planning and deployment of such systems. The paper also analyzes results obtained by running lexical analysis programs on a small Hadoop cluster and the effect of various configuration parameters on total execution times of the test programs.
Keywords :
Big Data; distributed processing; software performance evaluation; text analysis; Apache Hadoop cluster; Cyrillic text lexical analysis program; HDFS; MapReduce algorithm; configuration parameters; distributed data processing technology; large dataset processing; performance testing; test programs; Algorithm design and analysis; Clustering algorithms; Data handling; Data storage systems; File systems; Information management; Testing; HDFS; MapReduce; big data; distributed processing; lexical analysis; performance;
Conference_Titel :
Telecommunications Forum (TELFOR), 2013 21st
Conference_Location :
Belgrade
Print_ISBN :
978-1-4799-1419-7
DOI :
10.1109/TELFOR.2013.6716361