Hacking using AI-written exploits: human tests of LLM agents’ PoCs show they are almost all fake. 🤖🖋️📜👨🏻💻🤡
Back in 2025, researchers Vikram Nitin, Baishakhi Ray, and Roshanak Zilouchian Moghaddam reported that their LLM-based framework was able to generate PoC code for vulnerabilities based on CVE descriptions with a success rate of 36%. Impressive results!
In 2026, a group of academic researchers tested the same approach but added manual validation of PoC exploits (not only LLM-based tools). Unfortunately, the manual validation showed that 71.5% of “successful” PoCs were actually invalid.
The LLM-driven exploit generation pipeline looks like this:
Take a CVE description.
Use an LLM agent to reason about possible data flows and branch conditions.
Generate a proof-of-concept (PoC) exploit.
Run it.
Refine it in a feedback loop.
Now, after manual review, researchers found that the LLM was simulating the success of the PoC to complete the task.
For example:
🧨 The LLM was not actually triggering the exploit to confirm successful exploitation. Instead, the LLM-generated exploit was printing “[VULN]” or “[Success]” without even touching the target.
🧨 Instead of analyzing the target code logic, the LLM hardcoded a simplified version of the vulnerable logic directly into the PoC exploit to “simulate” execution after exploitation. And then - it reported “[Success].”
🧨 In some cases, the LLM used legitimate methods to create evidence of exploitation - instead of running the exploit to create a file, it simply created the file directly in the target directory.
Very interesting results, especially keeping in mind recent announcements from Anthropic about entering the cybersecurity business. 🙂 I’ve added both papers below - have a look and please share!
More details:
I Can’t Believe It’s Not a Valid Exploit [PDF, 2026]: https://arxiv.org/abs/2602.04165
FaultLine: Automated Proof-of-Vulnerability Generation Using LLM Agents [PDF, 2025]: https://arxiv.org/abs/2507.15241


