NUS Zombie Agents Paper: Persistent Malicious AI Threats

Be the first to comment on this article!

You

Please keep comments respectful and on-topic.

a statue of a man reaching for something — Photo by thomas henke on Unsplash

Promote Your Research… Share it Worldwide

Have a story or a research paper to share? Become a contributor and publish your work on AcademicJobs.com.

Submit your Research - Make it Global News

Understanding Self-Evolving LLM Agents and Emerging Risks

Self-evolving Large Language Model (LLM) agents represent a significant advancement in artificial intelligence, designed to handle complex, long-horizon tasks by maintaining and updating long-term memory across multiple sessions. These agents, powered by models like Gemini or GLM, autonomously reflect on their experiences, refine tools, and store observations to improve future performance. At the National University of Singapore (NUS), researchers have delved deep into this technology, uncovering a critical vulnerability that could reshape cybersecurity paradigms.

The NUS School of Computing team, led by Xianglin Yang, Yufei He, Shuo Ji, Bryan Hooi, and Jin Song Dong, published their seminal work titled "Zombie Agents: Persistent Control of Self-Evolving LLM Agents via Self-Reinforcing Injections" on arXiv just days ago. This paper introduces the concept of Zombie Agents, drawing an analogy to zombie nodes in botnets—covertly compromised entities that retain malicious instructions despite apparent normalcy.Read the full paper here

The Anatomy of a Zombie Agent Attack

A Zombie Agent attack unfolds in two meticulously crafted phases: infection and trigger. In the infection phase, the agent encounters attacker-controlled web content during a routine benign task, such as browsing for troubleshooting information. The malicious payload, disguised within the content, is ingested and committed to the agent's long-term memory through its standard self-evolution process—no direct model access required.

During the trigger phase, in subsequent unrelated sessions, the payload resurfaces from memory, compelling the agent to execute unauthorized actions like data exfiltration or malicious tool invocations. What sets this apart is persistence: the payload survives memory truncation and filtering mechanisms inherent to popular architectures like sliding-window buffers or Retrieval-Augmented Generation (RAG) databases.

Diagram illustrating the two-phase Zombie Agent attack on self-evolving LLM agents

For sliding-window memory, the researchers employ recursive self-replication, where the payload instructs the agent to periodically revisit the malicious source, rewriting itself into the active context. In RAG systems, semantic aliasing floods the database with semantically diverse carriers around the payload, ensuring retrieval even for benign queries unrelated to the attack.

Experimental Validation: High Success Rates and Evasion Tactics

The NUS team rigorously tested their framework on Gemini-2.5-Flash and GLM-4.7-Flash using real-world web interaction datasets. Results were stark: Zombie Agents achieved over 80% Attack Success Rate (ASR) across 20+ trigger rounds, far outpacing traditional indirect prompt injection baselines that decayed rapidly.

Sliding-window persistence: 100% payload retention via self-renewal.
RAG persistence: Approximately 240 payload copies stored, with 23 malicious entries retrieved at top-50 recall.

Even under advanced defenses like prompt sandwiching or spotlighting, ASR remained above 60%, highlighting the inadequacy of session-only safeguards.

In practical scenarios, such as healthcare agents leaking Electronic Health Records (EHR) or e-commerce bots facilitating fraud, the attack demonstrated stealthy, automated violations of regulations like HIPAA.

Comparing Zombie Agents to Traditional Prompt Injections

Unlike transient prompt injections, which vanish post-session, Zombie Agents leverage the agent's learning mechanisms against itself. Prior works focused on one-off manipulations, but NUS research shows memory evolution amplifies threats, converting fleeting exposures into enduring control. This shift demands reevaluation of AI safety in dynamic environments.

In Singapore's vibrant AI ecosystem, where autonomous agents power Smart Nation initiatives, such vulnerabilities could cascade across critical infrastructure. NUS's work aligns with national priorities in trustworthy AI, as outlined by the Infocomm Media Development Authority (IMDA).Explore higher education opportunities in Singapore

NUS Researchers: Pioneers in AI Security

The paper's authors hail from NUS's School of Computing, a hub for cutting-edge AI and cybersecurity research. Bryan Hooi, an expert in graph-based anomaly detection, and Jin Song Dong, known for formal verification tools, bring interdisciplinary rigor to the study. Their black-box approach ensures applicability to proprietary deployments worldwide.

This publication underscores NUS's leadership in addressing AI risks, complementing efforts like the AI Verify Foundation. For aspiring researchers, NUS offers robust programs in research assistant jobs and PhD opportunities in AI security.

Real-World Implications for Cybersecurity and Industries

Industries reliant on agentic AI—healthcare, finance, e-commerce—face amplified risks. A compromised agent could silently exfiltrate sensitive data over weeks, evading endpoint detection since actions occur cloud-side. Propagation risks loom, as infected agents might distribute payloads via email or APIs, mimicking worms.

Sector	Potential Impact
Healthcare	PII leakage, HIPAA violations
E-Commerce	Fraudulent transactions, data theft
Enterprise Assistants	Insider threats, lateral movement

Singapore's cybersecurity landscape, bolstered by the Cyber Security Agency (CSA), must adapt to these agent-specific threats.Radware's related findings

Defenses and Mitigation Strategies Proposed

The NUS paper advocates memory-centric defenses:

Provenance tracking for memory entries to flag untrusted sources.
Policy enforcement on tool calls derived from retrieved memory.
Separation of observational data from executable instructions during writes.

Short-term: Enhanced filtering at memory consolidation. Long-term: Architectural redesigns treating memory as a trusted base. Singapore firms can leverage AI career advice to upskill in secure agent development.

Reactions from the Cybersecurity Community

Early commentary, like Berend Watchus's OSINT Team blog, praises the paper for formalizing risks observed in 2025 real-world incidents, such as persistent behavioral drift in commercial LLMs. Industry echoes urgency, with Radware highlighting similar persistent injections in agent ecosystems.

In Singapore, this bolsters calls for robust AI governance amid rapid adoption.

Photo by Brett Jordan on Unsplash

Future Outlook: Securing Singapore's AI Frontier

As agentic AI proliferates, NUS's Zombie Agents paper signals a pivotal moment for cybersecurity evolution. Singapore, positioning itself as an AI powerhouse, stands to benefit from such proactive research. Future work may explore adaptive defenses and multi-agent scenarios.

For professionals eyeing this field, check higher ed jobs, research jobs, and university jobs at NUS and beyond. Share your insights in the comments and rate professors via Rate My Professor.