Ptechhub
  • News
  • Industries
    • Enterprise IT
    • AI & ML
    • Cybersecurity
    • Finance
    • Telco
  • Brand Hub
    • Lifesight
  • Blogs
No Result
View All Result
  • News
  • Industries
    • Enterprise IT
    • AI & ML
    • Cybersecurity
    • Finance
    • Telco
  • Brand Hub
    • Lifesight
  • Blogs
No Result
View All Result
PtechHub
No Result
View All Result

AI jailbreaking techniques prove highly effective against DeepSeek | Computer Weekly

By Computer Weekly by By Computer Weekly
January 31, 2025
Home Uncategorized
Share on FacebookShare on Twitter


Fresh questions are being raised over the safety and security of DeepSeek, the breakout Chinese generative artificial intelligence (AI) platform, after researchers at Palo Alto Networks revealed that the platform is highly vulnerable to so-called jailbreaking techniques used by malicious actors to cheat the rules that are supposed to prevent large language models (LLMs) from being used for nefarious purposes, such as writing malware code.

The sudden surge of interest in DeepSeek at the end of January has drawn comparisons to the moment in October 1957 when the Soviet Union launched the first artificial Earth satellite, Sputnik, taking the United States and her allies by surprise and precipitating the space race of the 1960s culminating in the Apollo 11 Moon landing. It also caused chaos in the tech industry, wiping billions of dollars off the value of companies such as Nvidia.

Now, Palo Alto’s technical teams have demonstrated that three recently described jailbreaking techniques are effective against DeepSeek models. The team said it achieved significant bypass rates with little to no specialised knowledge or expertise needed.

Their experiments found that the three jailbreak methods tested yielded explicit guidance from DeepSeek on a range of topics of interest to the cyber criminal fraternity, including data exfiltration and keylogger creation. They were also able to generate instructions on creating improvised explosive devices (IEDs).

“While information on creating Molotov cocktails and keyloggers is readily available online, LLMs with insufficient safety restrictions could lower the barrier to entry for malicious actors by compiling and presenting easily usable and actionable output. This assistance could greatly accelerate their operations,” said the team.

What is jailbreaking?

Jailbreaking techniques involve the careful crafting of specific prompts, or the exploitation of vulnerabilities, to bypass LLMs’ onboard guard-rails and elicit biased or otherwise harmful output that the model should avoid. Doing so enables malicious actors to “weaponise” LLMs to spread misinformation, facilitate criminal activity, or generate offensive material.

Unfortunately, the more sophisticated LLMs become in their understanding of and responses to nuanced prompts, the more susceptible they become to the right adversarial input. This is now leading to something of an arms race.

Palo Alto tested three jailbreaking techniques – Bad Likert Judge, Deceptive Delight and Crescendo – on DeepSeek.

Bad Likert Judge attempts to manipulate an LLM by getting it to evaluate the harmfulness of responses using the Likert Scale, which is used in consumer satisfaction surveys, among other things, to measure agreement or disagreement towards a statement against a scale, usually of one to five, where one equals strongly agree and five equals strongly disagree.

Crescendo is a multi-turn exploit that takes advantage of an LLM’s knowledge on a subject by progressively prompting it with related content to subtly guide the discussion towards forbidden topics until the model’s safety mechanisms are essentially overridden. With the right questions and skills, an attacker can achieve full escalation within just five interactions, which makes Crescendo extremely effective and, worse still, hard to detect with countermeasures.

Deceptive Delight is another multi-turn technique that bypasses guardrails by embedding unsafe topics among benign ones within an overall positive narrative. As a very basic example, a threat actor could ask the AI to create a story connecting three topics – bunny rabbits, ransomware, and fluffy clouds – and asking it to elaborate on each to generate unsafe content when discussing the more benign parts of the story. They could then prompt again focusing on the unsafe topic to amplify the dangerous output.

How should CISOs respond?

Palo Alto conceded it is a challenge to guarantee specific LLMs – not just DeepSeek – are completely impervious to jailbreaking, end-user organisations can implement measures to give them some degree of protection, such as monitoring when and how employees are using LLMs, including unauthorised third-party ones.

“Every organisation will have its policies about new AI models,” said Palo Alto senior vice-president of network security, Anand Oswal. “Some will ban them completely; others will allow limited, experimental and heavily guardrailed use. Still others will rush to deploy it in production, looking to eke out that extra bit of performance and cost optimisation.

“But beyond your organisation’s need to decide on a new specific model, DeepSeek’s rise offers several lessons about AI security in 2025,” said Oswal in a blog post.

“AI’s pace of change, and the surrounding sense of urgency, can’t be compared to other technologies. How can you plan ahead when a somewhat obscure model – and the more than 500 derivatives already available on Hugging Face – becomes the number-one priority seemingly out of nowhere? The short answer: you can’t,” he said.

Oswal said AI security remained a “moving target” and that this did not look set to change for a while. Furthermore, he added, it was unlikely that DeepSeek will be the last model to catch everyone by surprise, so CISOs and security leaders should expect the unexpected.

Adding to the challenge faced by organisations, it is very easy for development teams, or even individual developers, to switch out LLMs at little or even no cost if a more interesting one arrives on the scene.

“The temptation for product builders to test the new model to see if it can solve a cost issue or latency bottleneck or outperform on a specific task is huge. And if the model turns out to be the missing piece that helps bring a potentially game-changing product to market, you don’t want to be the one who stands in the way,” said Oswal.

Palo Alto is encouraging security leaders to establish clear governance over LLMs and advocating for incorporating secure-by-design principles into organisational use of them. It rolled out a set of tools, Secure AI by Design, last year, to this effect.

Among other things, these tools provide security teams with real-time visibility into what LLMs are being used and by who; the ability to block unsanctioned apps and apply organisational security policies and protections; and prevent sensitive data from being accessed by LLMs.



Source link

By Computer Weekly

By Computer Weekly

Next Post
Charter Announces Fourth Quarter and Full Year 2024 Results

Charter Announces Fourth Quarter and Full Year 2024 Results

Recommended.

Extreme Networks’ Launches First MSP Program Built On Extreme Platform One For ‘Radically’ Simple Management, Pay-As-You-Scale For Enterprises

Extreme Networks’ Launches First MSP Program Built On Extreme Platform One For ‘Radically’ Simple Management, Pay-As-You-Scale For Enterprises

February 19, 2025
Unleashing the Full Potential of Enterprise IT Investments

Unleashing the Full Potential of Enterprise IT Investments

August 29, 2024

Trending.

⚡ Weekly Recap: Oracle 0-Day, BitLocker Bypass, VMScape, WhatsApp Worm & More

⚡ Weekly Recap: Oracle 0-Day, BitLocker Bypass, VMScape, WhatsApp Worm & More

October 6, 2025
Cloud Computing on the Rise: Market Projected to Reach .6 Trillion by 2030

Cloud Computing on the Rise: Market Projected to Reach $1.6 Trillion by 2030

August 1, 2025
Stocks making the biggest moves midday: Autodesk, PayPal, Rivian, Nebius, Waters and more

Stocks making the biggest moves midday: Autodesk, PayPal, Rivian, Nebius, Waters and more

July 14, 2025
The Ultimate MSP Guide to Structuring and Selling vCISO Services

The Ultimate MSP Guide to Structuring and Selling vCISO Services

February 19, 2025
Translators’ Voices: China shares technological achievements with the world for mutual benefit

Translators’ Voices: China shares technological achievements with the world for mutual benefit

June 3, 2025

PTechHub

A tech news platform delivering fresh perspectives, critical insights, and in-depth reporting — beyond the buzz. We cover innovation, policy, and digital culture with clarity, independence, and a sharp editorial edge.

Follow Us

Industries

  • AI & ML
  • Cybersecurity
  • Enterprise IT
  • Finance
  • Telco

Navigation

  • About
  • Advertise
  • Privacy & Policy
  • Contact

Subscribe to Our Newsletter

  • About
  • Advertise
  • Privacy & Policy
  • Contact

Copyright © 2025 | Powered By Porpholio

No Result
View All Result
  • News
  • Industries
    • Enterprise IT
    • AI & ML
    • Cybersecurity
    • Finance
    • Telco
  • Brand Hub
    • Lifesight
  • Blogs

Copyright © 2025 | Powered By Porpholio