Application Security DevSecOps AI Security

Bugcrowd launches RL environments for AI security training

Tue, 26th May 2026

Bugcrowd has launched Reinforcement Learning Environments for training AI models on software security tasks. The product is already in use by large language model providers, the company said.

The offering targets developers building models that can identify, exploit and repair software vulnerabilities using live training environments rather than synthetic security data. It gives AI agents direct access to vulnerable software and scores their performance as they complete security tasks.

The launch reflects growing interest in training AI systems for practical cyber security work, where developers have struggled to move beyond controlled benchmarks. Models may perform well on simplified test data but still fall short when faced with flaws in real software, especially when tasks involve more than bug detection.

Security tasks

Security work often involves several steps, from locating a flaw and triggering it to assessing whether it can be exploited and then fixing it without disrupting an application. Bugcrowd's environments are designed to train models across those stages, including exploitation, patching and audit.

The platform includes hundreds of thousands of training environments built from open-source vulnerabilities, each with source code and measurable outcomes. Bugcrowd said the environments are derived only from open-source software and do not use customer data or security researchers during model training.

The product draws on technology acquired through Bugcrowd's purchase of Mayhem Security. The deal brought autonomous code and API testing technology into Bugcrowd's platform, and the new training environments extend that work into AI model development.

Training gap

For AI developers, one of the main obstacles has been the time and engineering effort needed to build suitable reinforcement learning systems for security training. Bugcrowd said its service is intended to reduce that burden by providing a ready-made environment so model builders can focus on training and evaluation.

Dave Gerry, Chief Executive Officer, Bugcrowd, said the gap between lab conditions and real-world software remains a central problem in AI security training.

"The gap between what AI agents are trained on and what they encounter in the real world is where security breaks down," said Gerry. "Our RL Environments give frontier teams the infrastructure to build AI that learns security from real vulnerabilities, not approximations of them."

The argument is likely to resonate with AI labs trying to develop agents that can reason through complex technical tasks rather than simply classify known patterns. In cyber security, that difference is significant because a model may need to show not only that it has spotted a flaw, but also that the flaw is genuine, exploitable and can be fixed safely.

Real vulnerabilities

Bugcrowd argues that many existing approaches stop short of that standard. It says meaningful progress in AI security requires systems to work through authentic problems and receive objective feedback at each stage.

Dr David Brumley, Chief AI and Science Officer, Bugcrowd, said the industry has tended to focus too narrowly on vulnerability discovery.

"Most AI security training stops too early. Models learn to find bugs, but not to prove the bugs are real and exploitable. You cannot train a model to be good at security by showing it what security looks like, you have to give it real problems to solve and honest feedback on whether it solved them. At Bugcrowd, we have spent years building the environments, graders, and reward structures that take models further, from detection through exploitation, patching, and audit. That is what real security skill looks like, and it is what we are making available to frontier AI teams today," added Brumley.

The launch also highlights how cyber security vendors are repositioning themselves around AI infrastructure as well as traditional testing and exposure management. By offering training environments directly to model builders, Bugcrowd is targeting customers beyond enterprise security teams, including frontier AI research organisations and large model providers.

That market is still emerging, but demand is rising as AI companies seek ways to make systems more reliable in specialist domains such as coding and security. In that context, access to large volumes of realistic, measurable training tasks could become an important differentiator for companies supplying tools for model development.

Bugcrowd said its Reinforcement Learning Environments are available now for organisations that want to train AI agents on real-world security reasoning without spending years building their own systems.

ChatGPT

Key takeaways Explain why it matters Create action plan Future watch

Claude

Key takeaways Explain why it matters Create action plan Future watch

Perplexity

Key takeaways Explain why it matters Create action plan Future watch

Grok

Key takeaways Explain why it matters Create action plan Future watch

Share Share

Add us as a preferred source on Google

Image: Dave Gerry, Bugcrowd