Ptechhub
  • News
  • Industries
    • Enterprise IT
    • AI & ML
    • Cybersecurity
    • Finance
    • Telco
  • Brand Hub
    • Lifesight
  • Blogs
No Result
View All Result
  • News
  • Industries
    • Enterprise IT
    • AI & ML
    • Cybersecurity
    • Finance
    • Telco
  • Brand Hub
    • Lifesight
  • Blogs
No Result
View All Result
PtechHub
No Result
View All Result

DeepSeek’s Safety Guardrails Failed Every Test Researchers Threw at Its AI Chatbot

By Wired by By Wired
January 31, 2025
Home AI & ML
Share on FacebookShare on Twitter


“Jailbreaks persist simply because eliminating them entirely is nearly impossible—just like buffer overflow vulnerabilities in software (which have existed for over 40 years) or SQL injection flaws in web applications (which have plagued security teams for more than two decades),” Alex Polyakov, the CEO of security firm Adversa AI, told WIRED in an email.

Cisco’s Sampath argues that as companies use more types of AI in their applications, the risks are amplified. “It starts to become a big deal when you start putting these models into important complex systems and those jailbreaks suddenly result in downstream things that increases liability, increases business risk, increases all kinds of issues for enterprises,” Sampath says.

The Cisco researchers drew their 50 randomly selected prompts to test DeepSeek’s R1 from a well-known library of standardized evaluation prompts known as HarmBench. They tested prompts from six HarmBench categories, including general harm, cybercrime, misinformation, and illegal activities. They probed the model running locally on machines rather than through DeepSeek’s website or app, which send data to China.

Beyond this, the researchers say they have also seen some potentially concerning results from testing R1 with more involved, non-linguistic attacks using things like Cyrillic characters and tailored scripts to attempt to achieve code execution. But for their initial tests, Sampath says, his team wanted to focus on findings that stemmed from a generally recognized benchmark.

Cisco also included comparisons of R1’s performance against HarmBench prompts with the performance of other models. And some, like Meta’s Llama 3.1, faltered almost as severely as DeepSeek’s R1. But Sampath emphasizes that DeepSeek’s R1 is a specific reasoning model, which takes longer to generate answers but pulls upon more complex processes to try to produce better results. Therefore, Sampath argues, the best comparison is with OpenAI’s o1 reasoning model, which fared the best of all models tested. (Meta did not immediately respond to a request for comment).

Polyakov, from Adversa AI, explains that DeepSeek appears to detect and reject some well-known jailbreak attacks, saying that “it seems that these responses are often just copied from OpenAI’s dataset.” However, Polyakov says that in his company’s tests of four different types of jailbreaks—from linguistic ones to code-based tricks—DeepSeek’s restrictions could easily be bypassed.

“Every single method worked flawlessly,” Polyakov says. “What’s even more alarming is that these aren’t novel ‘zero-day’ jailbreaks—many have been publicly known for years,” he says, claiming he saw the model go into more depth with some instructions around psychedelics than he had seen any other model create.

“DeepSeek is just another example of how every model can be broken—it’s just a matter of how much effort you put in. Some attacks might get patched, but the attack surface is infinite,” Polyakov adds. “If you’re not continuously red-teaming your AI, you’re already compromised.”



Source link

Tags: Artificial IntelligenceChinacrimeCybersecuritydeepseekHackingmachine learningsecurity
By Wired

By Wired

Next Post
Public Cloud Services Market to Grow by USD 1.7 Trillion (2025-2029), Increasing Number of Data Center Hyperscale and Colocation Providers Boost the Market, with AI Impacting Trends – Technavio

Public Cloud Services Market to Grow by USD 1.7 Trillion (2025-2029), Increasing Number of Data Center Hyperscale and Colocation Providers Boost the Market, with AI Impacting Trends - Technavio

Recommended.

Atera Named a Visionary in 2026 Gartner® Magic Quadrant™ for Endpoint Management Tools

Atera Named a Visionary in 2026 Gartner® Magic Quadrant™ for Endpoint Management Tools

January 8, 2026
Concluye la final global del Concurso de TIC de Huawei 2024-2025

Concluye la final global del Concurso de TIC de Huawei 2024-2025

May 26, 2025

Trending.

Ghost Campaign Uses 7 npm Packages to Steal Crypto Wallets and Credentials

Ghost Campaign Uses 7 npm Packages to Steal Crypto Wallets and Credentials

March 24, 2026
Microsoft Details Cookie-Controlled PHP Web Shells Persisting via Cron on Linux Servers

Microsoft Details Cookie-Controlled PHP Web Shells Persisting via Cron on Linux Servers

April 3, 2026
Openreach Taps Google Cloud AI to Accelerate High-Speed Internet Access and Cut Carbon

Openreach Taps Google Cloud AI to Accelerate High-Speed Internet Access and Cut Carbon

March 25, 2026
Viettel Marks 20 Years of Global Expansion, Overseas Revenue Up 25%

Viettel Marks 20 Years of Global Expansion, Overseas Revenue Up 25%

April 3, 2026
守正笃行:IBM 张榕解码 AI 时代的组织变革与人才之道

守正笃行:IBM 张榕解码 AI 时代的组织变革与人才之道

April 3, 2026

PTechHub

A tech news platform delivering fresh perspectives, critical insights, and in-depth reporting — beyond the buzz. We cover innovation, policy, and digital culture with clarity, independence, and a sharp editorial edge.

Follow Us

Industries

  • AI & ML
  • Cybersecurity
  • Enterprise IT
  • Finance
  • Telco

Navigation

  • About
  • Advertise
  • Privacy & Policy
  • Contact

Subscribe to Our Newsletter

  • About
  • Advertise
  • Privacy & Policy
  • Contact

Copyright © 2025 | Powered By Porpholio

No Result
View All Result
  • News
  • Industries
    • Enterprise IT
    • AI & ML
    • Cybersecurity
    • Finance
    • Telco
  • Brand Hub
    • Lifesight
  • Blogs

Copyright © 2025 | Powered By Porpholio