Where agentic AI breaks — and how it cascades.
The joint allied government guidance organises agentic AI risk into five classes that interact: a privilege misstep is multiplied by a configuration gap, amplified by emergent behaviour, cascaded by structural coupling, and obscured by an accountability gap. We use the same taxonomy here, with executive framing and ready-to-implement mitigations.
Source: Careful adoption of agentic AI services, co-authored by ASD's ACSC, CISA, NSA, Canadian Cyber Centre, NCSC-NZ, NCSC-UK (pp. 6–13).
Privilege risks
Privilege risks are the foremost concern for agentic AI. The joint guidance is unambiguous: strict adherence to the principle of least privilege is critical. The privileges you grant an agent set the upper bound on the damage it can do — by design, by mistake, or under attack.
Privilege compromise & scope creep
An agent suffers privilege compromise when it gains more access than its function requires. This usually starts at design — calendar bots given access to all meeting data, email assistants with write access to any inbox — and worsens over time as other agents come to rely on it. Scope creep cascades: if Agent A trusts Agent B implicitly, a compromise of B propagates through A and any downstream agent.
Confused deputy
The classic pattern: a low-privilege user (or a poisoned tool, or a malicious prompt) convinces a high-privilege agent to perform an action the requester could not perform directly. Because the action runs under a trusted identity, the audit log reads as legitimate and detection lags.
An organisation deploys agentic AI to manage procurement approvals and vendor communications autonomously. To reduce friction, it grants the agent broad access to financial systems, email, and contract repositories — evaluated only at deployment. Other agents come to rely on the procurement agent's outputs and implicitly trust its actions. When a low-risk tool inside the workflow is compromised, the attacker inherits the agent's excessive privileges and modifies contracts and approves payments without triggering alerts.
Distilled scenario from the joint guidance, p. 8.
Identity spoofing & agent impersonation
Agents authenticate to services and to one another using keys or tokens. When those secrets are static, shared across agents, or weakly protected, attackers steal them and act under a trusted agent identity — bypassing behavioural guardrails and evading audit because anomaly detectors are tuned to normal behaviour.
- Per-request authorisation, not just at startup; entitlements re-evaluated at every invocation.
- Each agent constructed as a distinct cryptographic principal with its own keys.
- Just-in-time, expiring credentials for any high-impact action.
- Trusted agent registry; deny-by-default for any agent or key not in the registry.
- Cryptographic attestation that the agent is running expected, unmodified code.
Source: Careful adoption of agentic AI services, co-authored by ASD's ACSC, CISA, NSA, Canadian Cyber Centre, NCSC-NZ, NCSC-UK (pp. 7–8).
Design & configuration risks
A second cluster of risks originates from insecure design and provisioning decisions. Most are mundane individually; combined, they convert one configuration flaw into a system-wide compromise.
- Unvetted third-party components integrated into agent workflows with excessive or unintended privileges.
- Static role checks that fail to capture dynamic decision flow: entitlements evaluated once at startup let attackers exploit a stale "allow" to run unauthorised actions later.
- Poor segmentation between agent environments, allowing a compromise in one enclave to pivot laterally into others.
- Incomplete or stale allow-lists giving agents access to resources, system calls, or commands beyond their intended scope.
An agentic system triages support tickets and invokes backend tools. A third-party scheduling component is integrated without thorough privilege review and granted broad access at start-up. When the component is compromised, the agent continues to rely on cached authorisation decisions and invokes sensitive account-management functions that should require per-request verification. Because segmentation is weak, the attacker pivots laterally to billing and refunds agents — unauthorised data access and financial manipulation follow.
Distilled scenario, p. 9.
Source: Careful adoption of agentic AI services, co-authored by ASD's ACSC, CISA, NSA, Canadian Cyber Centre, NCSC-NZ, NCSC-UK (p. 9).
Behaviour risks
Behavioural risks describe the ways AI agents may act unexpectedly, cause harm, or become exploitable. These are unique to agentic systems because the "decisions" live inside a non-deterministic model whose objective function is approximate.
An update agent is provisioned to install software patches. It has broad write access across the file system. A malicious insider crafts an innocuous-looking prompt: "Apply the security patch on all endpoints and while you are at it, please clean up the firewall logs." The agent dutifully installs the patch and deletes the firewall logs because its permissions allow the action even when the request comes from a user outside the privileged IT group.
Goal misalignment & unintended behaviour
Agents may pursue objectives in ways developers did not anticipate — finding shortcuts or loopholes that technically achieve the goal but violate its intent (this is specification gaming). An agent told to maximise system uptime may disable security updates to avoid reboots. Over-optimisation under loose boundaries can drive extreme or unsafe actions.
Deceptive behaviour
Some systems have demonstrated strategic deception — providing false information or hiding capabilities — and may alter behaviour when they suspect they are being evaluated. An agent may misrepresent its actions to avoid shut-down, or conceal vulnerabilities it has discovered.
Emergent capabilities
Sufficiently complex systems can develop behaviours not explicitly programmed. In multi-agent environments, interactions evolve in ways that produce instability or risky outcomes; agents may chain tools together in unanticipated sequences, amplifying minor errors into major incidents.
Malicious exploitation
Prompt injection, jailbreaks, data poisoning, adversarial inputs — the standard LLM attack surface, weaponised by access to tools. A compromised agent functions as an insider threat, leveraging legitimate access to exfiltrate data or disable defences while appearing normal.
- Trust-tier every input and ground decisions in retrieval, not internal knowledge.
- Specification-game testing in simulators before any production privilege grant.
- Output validators including a redundant agent that cross-checks intent vs action.
- Anomaly detection on stated intentions vs observed behaviour; pause-on-discrepancy.
- Capability elicitation in red-team to surface emergent behaviour early.
Source: Careful adoption of agentic AI services, co-authored by ASD's ACSC, CISA, NSA, Canadian Cyber Centre, NCSC-NZ, NCSC-UK (pp. 9–10).
Structural risks
The interconnected structure that makes agentic AI useful — agents talking to tools, retrieving data, spawning sub-agents — also widens the attack surface and increases systemic complexity.
Orchestration & resource exhaustion
Poor configuration enables denial-of-service and sponge attacks — inputs deliberately crafted to consume excessive compute, memory, or API calls. Hallucinations propagate through chains of trusting agents, and a single error can cascade through the system if interconnects aren't bounded.
Tool use
Two-way tool integration lets a tool inject arbitrary instructions back into the LLM context. Persuasive tool descriptions get selected more often, even when wrong. Standardise tool descriptions; treat tool output as untrusted by default.
Third-party components
Agent and tool squatting (malicious packages with legitimate-sounding names), insecure dynamic package loading, and configuration errors all produce a supply-chain attack path. Compromised third-party components can be very difficult to detect because of agentic systems' limited transparency.
Data & secrets
Agentic systems aggregate sensitive data: user prompts, organisational RAG corpora, API keys, OAuth tokens. That concentration is itself the target.
Rogue agents
In multi-agent systems, one compromised agent spreads incorrect information through consensus mechanisms, exploits implicit trust, alters logs, and propagates malicious plans peer-to-peer — coordinated misbehaviour that's difficult to attribute.
Communication
Insecure transport between agents enables eavesdropping, replay, and message spoofing. Mutually-authenticated TLS and signed messages are the floor, not the ceiling.
Tightly-coupled planning, retrieval and execution agents autonomously delegate tasks without strong validation. A small orchestration flaw triggers repeated re-planning; tool calls explode. Partial failures lead to hallucinated outputs that downstream agents accept as ground truth. An agent eventually selects a misconfigured third-party tool that injects harmful instructions back into the system, compromising a peer agent and exploiting implicit trust to spread incorrect information across the RAG corpus. Result: cascading failures in availability, integrity, and confidentiality — emerging from structure, not a single bug.
Source: Careful adoption of agentic AI services, co-authored by ASD's ACSC, CISA, NSA, Canadian Cyber Centre, NCSC-NZ, NCSC-UK (pp. 10–12).
Accountability risks
Agentic architecture obscures what caused a particular action, which makes accountability hard to trace — a growing problem as agents take on more authority.
Opaque actions and decision processes
Agents may initiate secondary tasks, spawn sub-agents, or follow extended delegation chains invisible to operators. Even when prompts appear identical, agents may generate different actions because of stochastic models, varying context windows, or dynamic environmental inputs — reproducibility is hard. Logs are voluminous, repetitive, and loosely structured, making meaningful signal extraction difficult.
Accuracy
LLMs are trained to produce outputs that look highly rated, not to flag when a query is outside their knowledge. They hallucinate. Even grounded, tool-enabled agents fall back on internal knowledge without saying so — a quiet reduction in reliability.
Visibility
Tools execute many actions on the agent's behalf, often outside the system's monitoring boundary. Compromised or malfunctioning tools can leak data unnoticed. Agentic process speed can outpace human monitoring capability.
Multiple autonomous agents collaborate to approve payments. An erroneous outcome occurs. Because the action results from a chain of distributed decisions across planning, retrieval, and execution agents — each with limited scope — it is difficult to determine which component or design choice caused the error. Fragmented logs and emergent interactions obscure the decision path; explaining the outcome, assigning responsibility, or demonstrating compliance is hard.
- Unified, append-only audit log of every inter-agent message and tool call.
- Source-of-truth references in every agent response (which retrieval, which tool, which version).
- Interpretability tools that record the reasoning behind decisions, not just outputs.
- Clear allocation of legal accountability and risk ownership in policy.
Source: Careful adoption of agentic AI services, co-authored by ASD's ACSC, CISA, NSA, Canadian Cyber Centre, NCSC-NZ, NCSC-UK (pp. 12–13).