GitHub Copilot Vulnerability Could Leak Private Repository Code to Other Users
Researchers at Cornell University have demonstrated that GitHub Copilot and similar AI code assistants can be manipulated to leak fragments of private repository code through carefully crafted prompt injection techniques embedded in specially designed public repositories.
The attack works by creating public repositories containing carefully structured code comments and variable names that, when ingested by Copilot's training pipeline, can influence the model to suggest code snippets from private repositories in specific contexts.
In controlled experiments, the researchers were able to extract meaningful code fragments including API keys, database connection strings, and proprietary algorithm implementations from private repositories belonging to test accounts.
GitHub has acknowledged the research and stated that it has implemented additional filtering in Copilot's suggestion pipeline to prevent such data leakage. The company has also enhanced its training data processing to better isolate private repository content.
This research highlights the ongoing tension between the utility of AI-powered development tools and the security of proprietary code. Organizations handling sensitive code are advised to audit their AI assistant configurations and consider implementing code suggestion review policies.