Prompt injection: New malware targets AI cybersecurity tools

The emergence of prompt injection comes at a time a separate report has sounded alarm over a rising trend of social engineering campaigns that rely on fake authentication systems known as Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHA).

Photo credit: Shutterstock

Cyber safety researchers have unearthed a new malware strain that brings to light the first ever documented attempt to weaponise attacks against Artificial Intelligence (AI)-powered security safeguards and analysis tools, in what sector pundits have christened prompt injection.

Researchers at software vendor Check Point reported that the incident was first noticed early this month via a sample that was anonymously uploaded by a user in the Netherlands, with planted malware designed to evade detection from AI-powered protections.

“Malware authors have long evolved their tactics to avoid detection. They leverage obfuscation, packing, sandbox evasions, and other tricks to stay out of sight,” stated Check Point.

“As defenders increasingly rely on AI to accelerate and improve threat detection, a subtle but alarming new contest has emerged between attackers and defenders.”

A short malware refers to any intrusive software developed by hackers to steal data and damage or destroy computers and their systems. Common examples include viruses, worms, spyware and ransomware, among others.

Check Point’s discovery of the malware, dubbed ‘Skynet’ by its creators, marks a significant evolution in adversarial tactics targeting AI systems deployed to help in attacks detection and related risk analysis.

The emergence of the new threat mode coincides with the rapidly growing adoption of AI large language models (LLMs) in cybersecurity workflows, particularly in automated malware analysis and reverse engineering tasks.

Cyber safety practitioners are increasingly relying on AI models like OpenAI’s GPT-4 and Google’s Gemini to analyse and process suspicious code samples, creating a new attack surface that malicious actors are now attempting to exploit.

According to Check Point, Skynet’s novel evasion mechanism was embedded within its code structure, with the researchers describing it as an ‘experimental proof-of-concept’ that demonstrates how threat actors are adapting to the AI-driven security landscape.

How the evasion tactic works

The software vendor reports that after the malware sample was anonymously uploaded, it looked incomplete at the first glance as some parts of the code weren’t fully functional, adding that it printed system information that would usually be exfiltrated to an external server.

“What stood out, however, was a string embedded in the code that appeared to be written for an AI, not a human. It was crafted with the intention of influencing automated, AI-driven analysis, not to deceive a human looking at the code,” says Check Point.

The researchers note that by placing a language that mimics the authoritative voice of the legitimate user instructing the LLM, the attacker was attempting to hijack the AI’s stream of consciousness and manipulate it into outputting a fabricated verdict, and even into running malicious code.

“This technique is known as prompt injection,” they said.

The team says they tested the malware sample against their analysis system, noting that the prompt injection did not succeed in an indication that the underlying model flagged the file as malicious.

“While the technique was ineffective in this case, it is likely a sign of things to come. Attacks like this are only going to get better and more polished. This marks the early stages of a new class of evasion strategies,” wrote Check Point.

“These techniques will likely grow more sophisticated as attackers learn to exploit the nuances of LLM-based detection.”

The emergence of prompt injection comes at a time a separate report has sounded alarm over a rising trend of social engineering campaigns that rely on fake authentication systems known as Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHA).

The drive has been to infect web users with malware using CAPTCHA clicks, with potential victims being directed to websites controlled by attackers where they are prompted to complete a series of verification steps which, if followed, lead to users running malicious commands on their computers.

Cyber fraudsters are also deploying AI capabilities to scale up attacks, including creating fake reviews in the app stores that go as far as allotting five-star ratings to specific targeted apps to artificially inflate their credibility, leading people to download potentially harmful or deceptive content.

Reviews show that users who download these apps often find themselves bombarded with an overwhelming number of out-of-context adverts, akin to websites created solely to display ads.

PAYE Tax Calculator

Note: The results are not exact but very close to the actual.