Over the past few months I’ve been researching an interesting problem. It all began when I read this article about a developer who accidentally committed his keys to a public GitHub repository. This means that full access to his Amazon AWS account was available to anyone who stumbled upon it. After further research, I found this repository owned by Samar Acharya, which contained github search queries that could be used to find sensitive things like passwords, emails, and even the previously mentioned API keys. This inspired me to write this blog post for GitPrime, where I detailed an experiment I ran. During this experiment, I was able to estimate that over 300 sensitive files were modified or created on a daily basis.
Since then, I have been working almost nonstop on a project. Until now, I’ve kept my project relatively secret. Today, I’m excited to share it with you. I feel it’s time to announce the development of GitShield. GitShield is a tool designed for two primary purposes.
Functions
- Public safety
- This is the primary function of GitShield. All other aspects of the project are less important in my mind. GitShield has begun scanning GitHub’s public even stream for accidentally disclosed data and notifying commit authors of the breach.
- Stasis
- Unfortunately, it’s important that this project generates enough revenue to survive. To do this, we plan to provide more extensive scanning to paid users. This includes scanning every commit, as well as scanning for common exploits such as SQL injection.
Some cool stats
So far, we’ve notified roughly 150 project authors of breaches. Over 923 files have matched our patterns, but 649 of these files have been removed by heuristics, leaving 274 remaining files. We’ve recieved one false positive report, and have sent two notifications that have failed due to invalid addresses. We’ve scanned over eight hundred thousand commits, consisting of over 6 million files!
Rant
I’m absolutely ecstatic about this project. I had my doubts originally, but I’m really excited to reach alpha. Currently GitShield consists of myself, another application developer, and a web developer. I’m hoping to collaborate with the open source community to build our database, and to make as much data public as possible. It’s been a long time since I’ve had a project this close to a public stage. I can’t wait to see what happens!