Applying MapReduce algorithm to performance testing in lexical analysis on HDFS

Author

Joldzic, Ognjen V.

Author_Institution

Fac. of Electr. Eng., Univ. of Banja Luka, Banja Luka, Bosnia-Herzegovina

fYear

2013

fDate

26-28 Nov. 2013

Firstpage

841

Lastpage

844

Abstract

This paper presents an overview of distributed data processing technology, and explores the possibilities and advantages of using this technology in lexical analysis of Cyrillic text. A detailed overview of one of the most widely used framworks for processing large datasets - Apache Hadoop - is presented, along with a recommendation for planning and deployment of such systems. The paper also analyzes results obtained by running lexical analysis programs on a small Hadoop cluster and the effect of various configuration parameters on total execution times of the test programs.

Keywords

Big Data; distributed processing; software performance evaluation; text analysis; Apache Hadoop cluster; Cyrillic text lexical analysis program; HDFS; MapReduce algorithm; configuration parameters; distributed data processing technology; large dataset processing; performance testing; test programs; Algorithm design and analysis; Clustering algorithms; Data handling; Data storage systems; File systems; Information management; Testing; HDFS; MapReduce; big data; distributed processing; lexical analysis; performance;

fLanguage

English

Publisher

ieee

Conference_Titel

Telecommunications Forum (TELFOR), 2013 21st

Conference_Location

Belgrade

Print_ISBN

978-1-4799-1419-7

Type

conf

DOI

10.1109/TELFOR.2013.6716361

Filename

6716361