A system of job log analyzing for Hadoop

Author

Zhao Xiaogang ; Ma Zhiqiang ; Ding Ling ; Liu Xu

Author_Institution

Dept. of Software Eng., Wuhan Univ., Wuhan, China

Volume

3

fYear

2012

fDate

20-21 Oct. 2012

Firstpage

238

Lastpage

243

Abstract

Handling the huge amount of history logs produced by Hadoop distributed computing platform is really a troublesome task and always these history files looks useless. But if we want find out the health degree of cluster platform we must analyze the huge history logs produced by the running jobs. It seems that a single-machine analyzing program cannot satisfy you because of its low speed, high demand of memory and CPU. In this thesis we tried to solve this problem in a distributed way with the Map/Reduce calculation model. We also built a data platform(hive and MySQL) to store these data. From the experiment we can see the distributed way to process log files get good performance when data log files are huge.

Keywords

SQL; distributed processing; program diagnostics; storage management; CPU; Hadoop distributed computing platform; Map/Reduce calculation model; MySQL; a data platform; cluster platform; data log files; data storage; distributed problem solving; health degree; history files; history logs; hive; job log analyzing system; memory; process log files; running jobs; single-machine analyzing program; Blogs; Databases; History; Hadoop; Map/Reduce; distribute computing; history log;

fLanguage

English

Publisher

ieee

Conference_Titel

Information Management, Innovation Management and Industrial Engineering (ICIII), 2012 International Conference on

Conference_Location

Sanya

Print_ISBN

978-1-4673-1932-4

Type

conf

DOI

10.1109/ICIII.2012.6339963

Filename

6339963