Abstract
AI coding agents are shipping code faster than security teams can review it. The tools are getting more autonomous, the volume is going up, and nobody has a good answer to the obvious question: how secure is the code these things are actually writing?
We decided to find out.
We built an internal platform using AI agents, then pointed a different set of AI agents at it to hunt for vulnerabilities. What we found matters to any team that's shipping AI-generated code and hasn't looked too hard at what's underneath.
What you'll learn
First, a quick look at what the major AI labs and security researchers have actually published on AI-assisted vulnerability research. The capabilities are further along than most people realize, and the security implications have been getting a lot less attention than the productivity story.
Then we'll get into our own work: how we structured the agents, what they found, and how AI-generated code tends to fail (and where it doesn't). What does it actually take to close the gap between code generation and security review using the same class of tools?
We'll close with what we think comes next: what near-term penetration testing looks like based on what we saw, and what your team should probably be doing now.
Key takeaways
- AI-generated code has a measurable security profile, and it's probably not what you'd expect
- Small security teams can use AI agents to review more surface area than they could ever cover manually
- The same tools generating your vulnerabilities can find them
- What autonomous security assessment actually looks like, right now