DocumentCode :
3758052
Title :
DRAT: An Unobtrusive, Scalable Approach to Large Scale Software License Analysis
Author :
Chris A. Mattmann;Ji-Hyun Oh;Tyler Palsulich;Lewis John McGibbney;Yolanda Gil;Varun Ratnakar
Author_Institution :
Jet Propulsion Lab., California Inst. of Technol., Pasadena, CA, USA
fYear :
2015
Firstpage :
97
Lastpage :
101
Abstract :
The Apache Release Audit Tool (RAT) performs software open source license auditing and checking, however RAT fails to successfully audit today´s large code bases. Being a natural language processing (NLP) tool and a crawler, RAT marches through a code base, but uses rudimentary black lists and white lists to navigate source code repositories, and often does a poor job of identifying source code versus binary files. In addition RAT produces no incremental output and thus on code bases that themselves are "Big Data", RAT could run for e.g., a month and still not provide any status report. We introduce Distributed "RAT" or the Distributed Release Audit Tool (DRAT). DRAT overcomes RAT´s limitations by leveraging: (1) Apache Tika to automatically detect and classify files in source code repositories and determine what is a binary file, what is source code, what are notes that need skipping, etc. (2) Apache Solr to interactively perform analytics on a code repository and to extract metadata using Apache Tika, and finally (3) Apache OODT to run RAT on per-MIME type (e.g., C/C++, Java, Javascript, etc.) and per configurable K-file sized chunks in a MapReduce workflow. Each Mapper task is an instance of RAT running on a K-file sized per Multipurpose Internet Mail Extensions (MIME) type chunk (split using Tika) and each mapper produces and incremental and intermediate log file, and where the Reducer aggregates the individual log files.
Keywords :
"Licenses","Metadata","Open source software","Java","Open systems","Government"
Publisher :
ieee
Conference_Titel :
Automated Software Engineering Workshop (ASEW), 2015 30th IEEE/ACM International Conference on
Type :
conf
DOI :
10.1109/ASEW.2015.14
Filename :
7426645
Link To Document :
بازگشت