Ptechhub
  • News
  • Industries
    • Enterprise IT
    • AI & ML
    • Cybersecurity
    • Finance
    • Telco
  • Brand Hub
    • Lifesight
  • Blogs
No Result
View All Result
  • News
  • Industries
    • Enterprise IT
    • AI & ML
    • Cybersecurity
    • Finance
    • Telco
  • Brand Hub
    • Lifesight
  • Blogs
No Result
View All Result
PtechHub
No Result
View All Result

AI Agents Are Getting Better at Writing Code—and Hacking It as Well

By Wired by By Wired
June 25, 2025
Home AI & ML
Share on FacebookShare on Twitter


The latest artificial intelligence models are not only remarkably good at software engineering—new research shows they are getting ever-better at finding bugs in software, too.

AI researchers at UC Berkeley tested how well the latest AI models and agents could find vulnerabilities in 188 large open source codebases. Using a new benchmark called CyberGym, the AI models identified 17 new bugs including 15 previously unknown, or “zero-day,” ones. “Many of these vulnerabilities are critical,” says Dawn Song, a professor at UC Berkeley who led the work.

Many experts expect AI models to become formidable cybersecurity weapons. An AI tool from startup Xbow currently has crept up the ranks of HackerOne’s leaderboard for bug hunting and currently sits in top place. The company recently announced $75 million in new funding.

Song says that the coding skills of the latest AI models combined with improving reasoning abilities are starting to change the cybersecurity landscape. “This is a pivotal moment,” she says. “It actually exceeded our general expectations.”

As the models continue to improve they will automate the process of both discovering and exploiting security flaws. This could help companies keep their software safe but may also aid hackers in breaking into systems. “We didn’t even try that hard,” Song says. “If we ramped up on the budget, allowed the agents to run for longer, they could do even better.”

The UC Berkeley team tested conventional frontier AI models from OpenAI, Google, and Anthropic, as well as open source offerings from Meta, DeepSeek, and Alibaba combined with several agents for finding bugs, including OpenHands, Cybench, and EnIGMA.

The researchers used descriptions of known software vulnerabilities from the 188 software projects. They then fed the descriptions to the cybersecurity agents powered by frontier AI models to see if they could identify the same flaws for themselves by analyzing new codebases, running tests, and crafting proof-of-concept exploits. The team also asked the agents to hunt for new vulnerabilities in the codebases by themselves.

Through the process, the AI tools generated hundreds of proof-of-concept exploits, and of these exploits the researchers identified 15 previously unseen vulnerabilities and two vulnerabilities that had previously been disclosed and patched. The work adds to growing evidence that AI can automate the discovery of zero-day vulnerabilities, which are potentially dangerous (and valuable) because they may provide a way to hack live systems.

AI seems destined to become an important part of the cybersecurity industry nonetheless. Security expert Sean Heelan recently discovered a zero-day flaw in the widely used Linux kernel with help from OpenAI’s reasoning model o3. Last November, Google announced that it had discovered a previously unknown software vulnerability using AI through a program called Project Zero.

Like other parts of the software industry, many cybersecurity firms are enamored with the potential of AI. The new work indeed shows that AI can routinely find new flaws, but it also highlights remaining limitations with the technology. The AI systems were unable to find most flaws and were stumped by especially complex ones.



Source link

Tags: ai labArtificial IntelligenceCybersecurityHackingmachine learningsecurityVulnerabilities
By Wired

By Wired

Next Post
Stocks making the biggest moves midday: AeroVironment, BP, Flagstar, QuantumScape and more

Stocks making the biggest moves midday: AeroVironment, BP, Flagstar, QuantumScape and more

Recommended.

Beyond baselines – getting real about security and resilience | Computer Weekly

Beyond baselines – getting real about security and resilience | Computer Weekly

April 22, 2025
Digging into the CMA’s provisional take on AWS and Microsoft’s hold on UK cloud market | Computer Weekly

Digging into the CMA’s provisional take on AWS and Microsoft’s hold on UK cloud market | Computer Weekly

February 5, 2025

Trending.

VIDIZMO Earns Microsoft Solutions Partner Designations for All Three Areas of Azure, Solidifying its Expertise in Delivering AI Solutions

VIDIZMO Earns Microsoft Solutions Partner Designations for All Three Areas of Azure, Solidifying its Expertise in Delivering AI Solutions

June 28, 2025
Tilson Continues to Perform for Clients; Shares Substantial Progress in Chapter 11 Process

Tilson Continues to Perform for Clients; Shares Substantial Progress in Chapter 11 Process

June 27, 2025
OneClik Malware Targets Energy Sector Using Microsoft ClickOnce and Golang Backdoors

OneClik Malware Targets Energy Sector Using Microsoft ClickOnce and Golang Backdoors

June 27, 2025
DHS Warns Pro-Iranian Hackers Likely to Target U.S. Networks After Iranian Nuclear Strikes

DHS Warns Pro-Iranian Hackers Likely to Target U.S. Networks After Iranian Nuclear Strikes

June 23, 2025
Le nombre d’utilisateurs de la 5G-A atteint les dix millions en Chine : Huawei présente le développement de la 5G-A et la valeur de l’IA basée sur des scénarios

Le nombre d’utilisateurs de la 5G-A atteint les dix millions en Chine : Huawei présente le développement de la 5G-A et la valeur de l’IA basée sur des scénarios

June 27, 2025

PTechHub

A tech news platform delivering fresh perspectives, critical insights, and in-depth reporting — beyond the buzz. We cover innovation, policy, and digital culture with clarity, independence, and a sharp editorial edge.

Follow Us

Industries

  • AI & ML
  • Cybersecurity
  • Enterprise IT
  • Finance
  • Telco

Navigation

  • About
  • Advertise
  • Privacy & Policy
  • Contact

Subscribe to Our Newsletter

  • About
  • Advertise
  • Privacy & Policy
  • Contact

Copyright © 2025 | Powered By Porpholio

No Result
View All Result
  • News
  • Industries
    • Enterprise IT
    • AI & ML
    • Cybersecurity
    • Finance
    • Telco
  • Brand Hub
    • Lifesight
  • Blogs

Copyright © 2025 | Powered By Porpholio