AI jailbreaking techniques prove highly effective against DeepSeek

Fresh questions are being raised over the safety and security of DeepSeek, the breakout Chinese generative artificial intelligence (AI) platform, after researchers at Palo Alto Networks revealed that the platform is highly vulnerable to so-called jailbreaking techniques used by malicious actors to cheat the rules that are supposed to prevent large language models (LLMs) from being used for nefarious purposes, such as writing malware code.

The sudden surge of interest in DeepSeek at the end of January has drawn comparisons to the moment in October 1957 when the Soviet Union launched the first artificial Earth satellite, Sputnik, taking the United States and her allies by surprise and precipitating the space race of the 1960s culminating in the Apollo 11 Moon landing. It also caused chaos in the tech industry, wiping billions of dollars off the value of companies such as Nvidia.

Now, Palo Alto’s technical teams have demonstrated that three recently described jailbreaking techniques are effective against DeepSeek models. The team said it achieved significant bypass rates with little to no specialised knowledge or expertise needed.

Their experiments found that the three jailbreak methods tested yielded explicit guidance from DeepSeek on a range of topics of interest to the cyber criminal fraternity, including data exfiltration and keylogger creation. They were also able to generate instructions on creating improvised explosive devices (IEDs).

“While information on creating Molotov cocktails and keyloggers is readily available online, LLMs with insufficient safety restrictions could lower the barrier to entry for malicious actors by compiling and presenting easily usable and actionable output. This assistance could greatly accelerate their operations,” said the team.