DocumentCode :
734236
Title :
WebScalding: A Framework for Big Data Web Services
Author :
Jacob, Ferosh ; Johnson, Aaron ; Javed, Faizan ; Meng Zhao ; McNair, Matt
Author_Institution :
DataScience R&D, Norcross, GA, USA
fYear :
2015
fDate :
March 30 2015-April 2 2015
Firstpage :
493
Lastpage :
498
Abstract :
CareerBuilder (CB) currently has 50 million active resumes and 2 million active job postings. Our team has been working to provide the most relevant jobs for job seekers and resumes for employers and recruiters. These goals often lead to Big Data problems. In this paper, we introduce WebScalding, a Big Data framework designed and developed to solve some of the common large scale data challenges at CB. The WebScalding framework raises the level of abstraction of Twitter´s Scalding framework to adapt to CB´s unique challenges. The WebScalding framework helps users by ensuring that: 1) All internal web services are available as cascading pipe operations, 2) These pipe operations can read from our common data sources and create a pipe assembly and, 3) The pipe assembly such created can be executed in the CB Hadoop cluster as well as local machines without making any changes. We describe WebScalding using three case studies taken from actual internal projects that explain how data scientists at CB not well versed in Big Data tools and methodologies leverage WebScalding to design, implement, and test Big Data applications. We also compare the execution time of a WebScalding program with its sequential Python counterpart to illustrate the super linear speed up of WebScalding programs.
Keywords :
Big Data; Internet; Web services; data handling; parallel processing; social networking (online); Big Data Web services; CB Hadoop cluster; CareerBuilder; Twitter scalding framework; WebScalding framework; cascading pipe operations; pipe assembly; sequential Python; Big data; Encyclopedias; Libraries; Resumes; Web services; XML;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Big Data Computing Service and Applications (BigDataService), 2015 IEEE First International Conference on
Conference_Location :
Redwood City, CA
Type :
conf
DOI :
10.1109/BigDataService.2015.53
Filename :
7184921
Link To Document :
بازگشت