• DocumentCode
    2949067
  • Title

    Applying MapReduce algorithm to performance testing in lexical analysis on HDFS

  • Author

    Joldzic, Ognjen V.

  • Author_Institution
    Fac. of Electr. Eng., Univ. of Banja Luka, Banja Luka, Bosnia-Herzegovina
  • fYear
    2013
  • fDate
    26-28 Nov. 2013
  • Firstpage
    841
  • Lastpage
    844
  • Abstract
    This paper presents an overview of distributed data processing technology, and explores the possibilities and advantages of using this technology in lexical analysis of Cyrillic text. A detailed overview of one of the most widely used framworks for processing large datasets - Apache Hadoop - is presented, along with a recommendation for planning and deployment of such systems. The paper also analyzes results obtained by running lexical analysis programs on a small Hadoop cluster and the effect of various configuration parameters on total execution times of the test programs.
  • Keywords
    Big Data; distributed processing; software performance evaluation; text analysis; Apache Hadoop cluster; Cyrillic text lexical analysis program; HDFS; MapReduce algorithm; configuration parameters; distributed data processing technology; large dataset processing; performance testing; test programs; Algorithm design and analysis; Clustering algorithms; Data handling; Data storage systems; File systems; Information management; Testing; HDFS; MapReduce; big data; distributed processing; lexical analysis; performance;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Telecommunications Forum (TELFOR), 2013 21st
  • Conference_Location
    Belgrade
  • Print_ISBN
    978-1-4799-1419-7
  • Type

    conf

  • DOI
    10.1109/TELFOR.2013.6716361
  • Filename
    6716361