A growing volume of sensitive data – or secrets – such as API keys, private keys, certificates, username and passwords end up publicly exposed on GitHub, putting corporate security at risk. GitHub repositories are notoriously abused by cybercriminals. So many data breaches have been reported in the past few years when it comes to github repositories.Github is a place for developers when it comes to innovation, collaboration and networking. Even though github is something which is used by the specialist, still they can also make some corny mistakes !!
Let’s take a look at some examples of data breach related to GitHub:
- 2014 – Uber company leaked personal data to 50,000 of its drivers. The reason was that in the GitHub public repository, the Uber developers saved Amazon cloud access keys (AWS), which, in turn, stored those lost data.
- 2017 – GitHub public repository revealed source codes, reports and development plans for several major financial institutions in Canada, the United States and Japan, which were placed there by employees of the Indian outsourcing company Tata Consultancy Service, whose customers were affected financial institutions.
More related examples can be found on Google. The question here is why organizations are facing these breaches, either they are ignoring the problem or poorly equipped to cope with it or they don’t have skillful resources.
One of the most common data leak issues occurs when users recognize they have accidentally published sensitive data.While the offending user may subsequently delete the data, GitHub is designed to keep track of historical modifications of published code, so the sensitive data remains publicly accessible.
Avoiding GitHub data leaks:
- Standardize Coding Conventions & Practices.
- Make sure your production environment remains private.
- Implement periodic reviews of your codebase.
- Organizations can take steps to avoid having sensitive data loaded onto GitHub. They include using data loss prevention controls to scan for sensitive data before making uploads to the testing site.
inDefend enables you to configure scans to effectively prioritize & resolve data policy violations while leveraging on machine learning to detect a wide range of potentially sensitive data in Github repositories. inDefend DLP alongwith the UEBA module will give the overall visibility of the user activities pertaining to github.
The Author is
Dhruv Khanna, CEO, Data Resolve Technologies Pvt. Ltd