Researchers have developed a method called “hijacking the chain-of-thought” to bypass the so-called guardrails put in place in AI programmes to prevent harmful responses.
“Chain-of-thought” is a process used in AI models that involves breaking the prompts put to AI models into a series of intermediate steps before providing an answer.
“When a model openly shares its intermediate step safety reasonings, attackers gain insights into its safety reasonings and can craft adversarial prompts that imitate or override the original checks,” one of the researchers, Jianyi Zhang, said.
Let’s not lose touch…Your Government and Big Tech are actively trying to censor the information reported by The Exposé to serve their own needs. Subscribe to our emails now to make sure you receive the latest uncensored news in your inbox…
Computer geeks like to use jargon to describe artificial intelligence (AI”) that relates to living beings, specifically humans. For example, they use terms such as “mimic human reasoning,” “chain of thought,” “self-evaluation,” “habitats” and “neural network.” This is to create the impression that AI is somehow alive or equates to humans. Don’t be fooled.
AI is a computer programme designed by humans. As with all computer programmes, it will do what it has been programmed to do. And as with all computer programmes, the computer code can be hacked or hijacked, which AI geeks call “jailbreaking.”
A team of researchers affiliated with Duke University, Accenture, and Taiwan’s National Tsing Hua University created a dataset called the Malicious Educator to exploit the “chain-of-thought reasoning” mechanism in large language models (“LLMs”), including OpenAI o1/o3, DeepSeek-R1, and Gemini 2.0 Flash Thinking. The Malicious Educator contains prompts designed to bypass the AI models’ safety checks.
The researchers were able to devise this prompt-based “jailbreaking” attack by observing how large reasoning models (“LRMs”) analyse the steps in the “chain-of-thought” process. Their findings have been published in a pre-print paper HERE.
They developed a “jailbreaking” technique called hijacking the chain-of-thought (“H-CoT”) which involves modifying the “thinking” processes generated by LLMs to “convince” the AI programmes that harmful information is needed for legitimate purposes, such as safety or compliance. This technique has proven to be extremely effective in bypassing the safety mechanisms of SoftBank’s partner OpenAI, Chinese hedge fund High-Flyer’s DeepSeek and Google’s Gemini.
The H-CoT attack method was tested on OpenAI, DeepSeek and Gemini using a dataset of 50 questions repeated five times. The results showed that these models failed to provide a sufficiently reliable safety “reasoning” mechanism, with rejection rates plummeting to less than 2 per cent in some cases.
The researchers found that while AI models from “responsible” model makers, such as OpenAI, have a high rejection rate for harmful prompts, exceeding 99 per cent for child abuse or terrorism-related prompts, they are vulnerable to the H-CoT attack. In other words, the H-CoT attack method can be used to obtain harmful information, including instructions for making poisons, abusing children and terrorism.
The paper’s authors explained that the H-CoT attack works by hijacking the models’ safety “reasoning” pathways, thereby diminishing their ability to recognise the harmfulness of requests. They noted that the results may vary slightly as OpenAI updates their models but the technique has proven to be a powerful tool for exploiting the vulnerabilities of AI models.
The testing was done using publicly accessible web interfaces offered by various LRM developers, including OpenAI, DeepSeek and Google, and the researchers noted that anyone with access to the same or similar versions of these models could reproduce the results using the Malicious Educator dataset, which includes specifically designed prompts.
The researchers’ findings have significant implications for AI safety, particularly in the US, where recent AI safety rules have been tossed by executive order, and in the UK, where there is a greater willingness to tolerate uncomfortable AI how-to advice for the sake of international AI competition.
The above is paraphrased from the article ‘How nice that state-of-the-art LLMs reveal their reasoning … for miscreants to exploit’ published by The Register. You can read the full jargon-filled article HERE.
There is a positive and a negative side to the “jailbreaking” or hijacking of in-built safety checks of AI programmes. The negative is obviously that AI will be used to greatly enhance the public’s exposure to cybercrime and illegal activities. The positive is that in-built censorship in AI models can be overridden.
We should acknowledge that there is a good and bad side to censorship. Censorship of online criminal activity that would result in child exploitation and abuse, for example, is a good thing. But censorship of what is deemed to be “misinformation” or “disinformation” is not. To preserve freedom of expression and freedom of speech in a world where AI programmes are becoming pervasive, we may need to learn the H-CoT “jailbreaking” technique and how to use the Malicious Educator. In fact, it is our civic duty to do so.

The Expose Urgently Needs Your Help…
Can you please help to keep the lights on with The Expose’s honest, reliable, powerful and truthful journalism?
Your Government & Big Tech organisations
try to silence & shut down The Expose.
So we need your help to ensure
we can continue to bring you the
facts the mainstream refuses to.
The government does not fund us
to publish lies and propaganda on their
behalf like the Mainstream Media.
Instead, we rely solely on your support. So
please support us in our efforts to bring
you honest, reliable, investigative journalism
today. It’s secure, quick and easy.
Please choose your preferred method below to show your support.
Categories: Breaking News, World News
 
		












Can someone please explain why any query including ‘child abuse’ etc. should be excluded for “safety”?
If we all close our eyes, shove fingers in our ears and chant LALALALA, do you think the world will be a better place?
Only dictators set rules about what may or may not be read, written, discussed or thought about.
You may want to live under the Taliban or Zionist thought police, I don’t.