Researchers call AI agents untrusted systems

A group of cybersecurity researchers has argued that artificial intelligence agents should not be trusted at the model level and must instead operate within strict system-level controls. The position appears in a May 2026 paper titled “Agent Security is a Systems Problem,” published on arXiv under identifier 2605.18991.

The authors, who include Mihai Christodorescu, Earlence Fernandes, and Somesh Jha, draw from experience in operating systems, networks, and adversarial machine learning. They state that current approaches focus too heavily on improving model robustness. That focus leaves gaps that attackers can exploit.

“Through this lens, efforts to increase model robustness, the dominant viewpoint in the community, are insufficient on their own. Instead, we must complement existing efforts with techniques from the systems security domain,” the researchers wrote.

Treating AI agents as untrusted components

The paper frames AI agents as untrusted elements inside larger systems. This approach mirrors how modern operating systems handle applications. Programs run with restricted permissions and cannot access sensitive resources without explicit approval.

The researchers argue that AI agents require similar constraints. They propose enforcing security invariants at the system level. These rules must remain outside the control of the AI itself. The system defines what an agent can and cannot do, regardless of its outputs.

They also recommend least-privilege access. An agent should only receive the minimum permissions required for a task. Full wallet access or unrestricted tool usage introduces unnecessary risk.

A third principle focuses on separating instructions from data. This distinction addresses one of the most common attack vectors in AI systems.

Prompt injection remains a core weakness

Prompt injection attacks exploit the inability of AI agents to distinguish between trusted commands and malicious input. Attackers embed hidden instructions within data fields that appear harmless.

The researchers link this weakness to real financial risks. A transaction memo or message can include instructions that redirect funds. If the agent treats that input as a valid command, it can execute the action without verification.

The paper states that stronger separation between instructions and data could prevent a large portion of these attacks. System-level enforcement would block unauthorized actions even if the model misinterprets input.

Real-world incidents highlight the risks

The research draws on eleven real-world attack scenarios. These cases show how failures at the system level allow exploits to succeed.

The Volo Protocol incident in April 2026 resulted in a loss of $500,000 from a crypto wallet. The attack relied on malicious tool calls and excessive agent permissions. The system lacked safeguards that could have blocked the transaction.

The autonomous nature of AI agents increases the impact. A human user might pause before approving a suspicious request. An AI agent executes instructions at machine speed. That speed reduces the window for detection or intervention.

Another recent case involved the AI-powered crypto assistant Bankr. The platform disabled transactions on May 20 after identifying unauthorized access affecting at least 14 wallets. Details of the exploit remain limited, but the response highlights ongoing security concerns around agent-based systems.

Industry responses begin to take shape

Some companies have started to address these risks through infrastructure changes. Ledger outlined a 2026 roadmap that includes hardware-based protections for AI environments. The approach places critical operations in secure hardware rather than relying entirely on software controls.

The researchers emphasize that security cannot depend solely on AI developers. Responsibility extends to system architects, infrastructure providers, and protocol designers. Each layer must enforce constraints that limit the impact of compromised agents.

The paper also points to techniques such as verifiable computation and on-chain attestation. These methods allow external systems to confirm that an agent acted within defined rules.

Crypto sector faces growing exposure

AI agents already operate across crypto applications. They execute trades, manage wallets, and interact with decentralized protocols. Circle CEO Jeremy Allaire has predicted that billions of AI agents could conduct economic activity within the next five years.

This expansion increases the attack surface. Each agent introduces a new point of interaction between software and financial assets. Without proper controls, a single vulnerability can expose multiple wallets or systems.

Aaron Ratcliff of Merkle Science addressed the issue in earlier commentary. He stated that giving an AI agent access to a wallet introduces a trust layer into a system designed to be trustless. He outlined conditions for safer deployment, including real-time checks for scams, slippage limits, and contract audits.

Sean Ren of Sahara AI described model context protocols as a safeguard when implemented correctly. He said these systems act as a gatekeeper between the AI model and user assets. The agent can perform only predefined actions, such as preparing a transaction for user approval.

A shift from model trust to system guarantees

The paper’s central argument shifts the focus of AI security. It moves from trusting model behavior to enforcing external guarantees. This approach draws on decades of computer security research that assumes attackers will find ways to exploit any flexible system.

The researchers call for predictable security guarantees that do not depend on model accuracy. They argue that system-level controls can limit damage even when an agent behaves incorrectly.

Challenges remain. The paper identifies gaps in implementation, including how to define enforceable invariants and how to integrate them across complex systems. These issues will shape future research and development in AI security.

The findings arrive as AI agents take on more financial responsibility. The combination of autonomy and asset control creates a high-risk environment. The paper suggests that without system-level protections, similar incidents will continue to occur.

Disclaimer: All materials on this site are for informational purposes only. None of the material should be interpreted as investment advice. Please note that, despite the nature of much of the material created and hosted on this website, HODL FM operates as a media and informational platform, not a provider of financial advisory services. The opinions of authors and other contributors are their own and should not be taken as financial advice. If you require advice, HODL FM strongly recommends contacting a qualified industry professional.

Researchers Call AI Agents Untrusted Systems

Treating AI agents as untrusted components

Prompt injection remains a core weakness

Real-world incidents highlight the risks

Industry responses begin to take shape

Crypto sector faces growing exposure

A shift from model trust to system guarantees

Sign up for Newsletter

More News

Block Launches Buzz, an Open-Source Slack Alternative

Apple Sues OpenAI Over Hardware Trade Secret Theft

SpaceXAI Releases Grok 4.5 Ahead of OpenAI GPT-5.6 Drop