ApacheCon 2016 has ended
Back To Schedule
Thursday, May 12 • 4:40pm - 5:30pm
DRAT: An Unobtrusive, Scalable Approach to Large Scale Software License Analysis - Karanjeet Singh, University of Southern California

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

The Apache Release Audit Tool (RAT) performs software open source license auditing, however RAT fails to successfully audit large code bases. Being a natural language processing tool and a crawler, RAT marches through a code base, but uses rudimentary black lists and white lists to navigate source code repositories, and often does a poor job of identifying source code versus binary files. We introduce Distributed "RAT" (DRAT). DRAT overcomes RAT's limitations by leveraging: (1) Apache Tika to automatically detect and classify files in source code repositories and determine what is a binary file; what is source code; what are notes that need skipping, etc. (2) Apache Solr to interactively perform analytics on a code repository and to extract metadata using Apache Tika; and finally (3) Apache OODT to run RAT on per-MIME type and per configurable K-file sized chunks in a MapReduce workflow.

avatar for Karanjeet Singh

Karanjeet Singh

Research Assistant, University of Southern California
He is pursuing his Master's degree in Computer Science from the University of Southern California (USC). His projects and research are mostly from the area of Information Retrieval and Data Science. He is also affiliated with NASA Jet Propulsion Lab. Prior to this, he was working... Read More →

Thursday May 12, 2016 4:40pm - 5:30pm PDT
Regency C

Attendees (3)