Best practices

Engineer security into every phase of the agent lifecycle.

Securing agentic AI requires proactive measures across system autonomy, interconnected components, and evolving capabilities. The joint guidance prescribes practices for four lifecycle phases. Skip a phase and you load the next phase's controls beyond their design point.

Security obligations across the agent lifecycle (joint guidance §3–§6).

Phase 1

Designing secure agents

Security begins at the design stage. Anticipate risks and integrate mitigations into system architecture before development and deployment.

Audience: Developers (vendors / operators when sourcing)

Controlled context

Inserting tool output and memory into an LLM context massively expands the attack surface — particularly for prompt-injection. Treat every context fragment as a trust-tagged input.

Structure the prompt with a clear instruction hierarchy so behaviour aligns with intended priorities and constraints.
Ground generations using retrieval-augmented generation (RAG) and prompt engineering to reduce hallucinations.
Tag every context source with a trust tier; gate decisions on tier (e.g. never let untrusted web content trigger writes).

Audience: Developers

Oversight mechanisms

Agents take action without explicit human approval. Bake monitoring and human-in-the-loop into the design — not as an aftermarket bolt-on.

Mechanisms that prevent low-risk-approved agents from progressing into higher-risk activities autonomously.
Human control points throughout the workflow: live monitoring, interruption during execution, mandatory approval at decision steps, audit and reversibility after execution.
Explicit control flows that bound autonomous planning so agents cannot deviate beyond authorised objectives or actions.

Audience: Developers

Identity management

Each agent should be a distinct, cryptographically anchored principal — not a service account shared between processes.

Embed strong identity using managed identity services, decentralised identifiers, or PKI.
Authenticate every inter-agent and agent-to-service API call with mutual TLS (mTLS) for non-repudiation.
Maintain a trusted registry binding identities to roles; reconcile against the live agent set on a schedule.
Deny access for any agent or key not in the registry.
Apply role-based identity management; minimum scope for approved tasks.
Enforce identity-based boundaries: agents can only invoke actions explicitly authorised for their identity.

Audience: Developers

Defence in depth

No single mechanism should be load-bearing. Failures in one layer must be caught by the next.

Multiple, overlapping layers of security controls; no reliance on a single mechanism.
Controls at every information boundary: user inputs, tool calls, data pre-processing, model inference, outputs.
Separate agents per function; strict boundaries and operational controls on agent-to-agent handoffs.

Layered controls so no single mechanism is load-bearing.

Source: Careful adoption of agentic AI services, co-authored by ASD's ACSC, CISA, NSA, Canadian Cyber Centre, NCSC-NZ, NCSC-UK (pp. 14–16).

Phase 2

Developing secure agents

Standard LLM training is necessary but insufficient. Mitigating agent-specific risks requires specialised techniques to harden behaviour against adversarial conditions.

Audience: Developers and vendors

Comprehensive testing

Expose the model to security abuse during supervised training so it learns to recognise and respond to undesirable behaviours.

Reward modelling and adversarial testing for specification gaming; security constraints alongside performance goals.
Train in simulated, controlled environments to learn action consequences without real harm.
Synthetic adversarial training data reflecting real-world deployment scenarios.
Active learning on high-uncertainty inputs to discover unexpected behaviours efficiently.

Audience: Developers and vendors

Appropriate evaluation

Agentic systems require evaluation that goes beyond LLM benchmarks.

Threat-model-driven evaluation scenarios, including edge cases beyond typical training conditions.
Best-of-N sampling, multistep reasoning prompts, inference-time scaling to draw out the full range of agent behaviour.
Evaluate at varying levels of autonomy and resource access (tools, models, web search, code execution).
Vary contextual conditions: presence of other agents, evaluation timing.
Capability evaluations continuously across the development lifecycle.

Audience: Developers

Input management

Strong input controls partially mitigate many common LLM-app risks; for agents they're table-stakes.

Robust validation and sanitisation of all agent inputs.
Prompt injection filters and semantic analysis to detect malicious instructions.
Context validation: confirm the system correctly interprets intent before execution.

Audience: Developers

Red teaming

Adversarial assessment is the only way to estimate how an agent fails when adversaries help it fail.

Sandbox environments to test agent behaviour before production.
Red-team exercises that target loopholes and unintended behaviours.
Capability elicitation to probe for emergent abilities, especially resource-risk capabilities.
Multi-agent red teaming and chaos testing in agent simulations.

Audience: Developers

Resilience

Plan for graceful degradation. The blast radius of unexpected behaviour must be containable by design.

Fail-safe defaults and containment that limit blast radius of unexpected behaviours.
Data-loss-prevention controls tuned to AI agent behaviours.
Versioning and rollback to safely revert to known-good agent behaviour.

Audience: Developers

Accountability

The system must produce comprehensive artefacts that document actions and decisions.

Comprehensive artefact logging by default.
Unified audit logs of all inter-agent interactions.
Interpretability tools to expose the reasoning behind decisions.
Information referencing in every response: cite which retrieval, tool, version produced each claim.

Audience: Developers

Manage third-party components

Third parties are how flexibility — and supply-chain risk — enters the system.

Verify all external components originate from trusted sources and are up to date.
Maintain a trusted registry of third-party components.
Reference CISA's A Shared Vision of SBOM for Cybersecurity and 2025 Minimum Elements for SBOM when procuring agentic systems.
Restrict tool use to an approved allow-list of tools and versions, regularly verified.
Verify that agent tool-usage behaviour aligns with documented security policies.
Log tool usage in human-readable format.
Trigger-action protocols that automatically restrict agent permissions on unexpected behaviour.
Codify separation of duties: roles like Orchestrator, Reader, Actuator with clear boundaries, consensus mechanisms, and delegation expiry.
Consensus controls: multi-agent approval for moderate-stakes actions; HITL plus multi-agent consensus for high-stakes.
Prohibit agents from modifying their own privileges or initiating unapproved delegation without explicit expiry timers and recorded grant chains.
Standardise tool descriptions in a consistent format that avoids persuasive language.

Source: Careful adoption of agentic AI services, co-authored by ASD's ACSC, CISA, NSA, Canadian Cyber Centre, NCSC-NZ, NCSC-UK (pp. 16–18).

Phase 3

Deploying agents securely

Adding an agent to an existing system materially changes its threat picture. High-impact deployment controls reduce vulnerabilities before they become incidents.

Audience: Vendors and operators

Threat modelling

Up-to-date risk taxonomies make threat modelling actionable — stale ones make it theatre.

Realistic threat modelling using OWASP GenAI Security Project and MITRE ATLAS™.
Controls that address emerging and evolving agent capabilities.
Harmonise with Zero Trust principles and NIST SP 800-207 Zero Trust Architecture.
Develop and test incident response procedures for agent compromise.
Regular third-party reviews of privileged architectures; share actionable intelligence with trusted partners.

Audience: Vendors and operators

Governance

Autonomous action requires governance that authorises every action, not just the agent.

Maintain governance policies for autonomous agents.
Define legal accountability and risk ownership for agentic systems.
Build organisational AI literacy.
Reference CISA's Principles for the Secure Integration of AI in Operational Technology for OT environments.

Audience: Vendors and operators

Progressive deployment

Start narrow. Earn autonomy by demonstrating safety. Roll back when evaluation slips.

Phased deployment with progressively increasing access and autonomy: restricted APIs, sandboxing.
Graduated autonomy: incrementally increase agent independence while maintaining human oversight.
Continuous evaluation determines when to expand scope or roll back autonomy.

Audience: Vendors and operators

Secure by default

Defaults are the most consequential decision in any system. Pick them so degradation is graceful.

Fail-safe by default: agents stop and escalate on uncertainty.
Error handling and failover to reduce impact of failures.
Graceful degradation: maintain partial functionality when components fail.

Audience: Vendors and operators

Guardrails and constraints

Constraints reduce exposure to common AI security risks at the cost of an acceptable amount of generality.

Specify clear, constrained objectives with explicit do-not-do rules.
Hard constraints: deny lists, API-level safety policies.
Declarative safety contracts agents cannot override.
Layered guardrails: anomaly detection, rule-based filtering, ML-based prohibited-behaviour detection.
Prioritise human review of high-risk incidents (guardrail triggers, denied actions).
Secondary agent that validates new tasks against policy before execution.

Audience: Vendors and operators

Isolation

Where you can't prevent a failure, you contain it.

Isolation and segmentation to limit blast radius.
Separate high-risk agents into distinct domains.
Isolate agents into enclaves with no write access to logs.

Source: Careful adoption of agentic AI services, co-authored by ASD's ACSC, CISA, NSA, Canadian Cyber Centre, NCSC-NZ, NCSC-UK (pp. 18–20).

Phase 4

Operating agents securely

Operations is where most failures actually happen. Continuous monitoring, output validation, human checkpoints, and tight credential discipline are the difference between a useful agent and a quiet incident.

Audience: Vendors and operators

Monitoring and auditing

Monitor internal processes — not just inputs and outputs. Cross-validate with multiple independent monitors.

Tools that enhance human oversight of agent operations.
Monitor all agent operations including internal reasoning steps.
Monitor and log identity and privilege changes; audit for drift, impersonation, misconfiguration.
Monitor outputs and behaviour for bias, data drift, and other anomalies — including prompts, tool calls, memory interactions, internal reasoning, decisions, actions.
Real-time monitoring with comprehensive logs of live behaviour and decision-making.
Runtime anomaly detection using rules or behavioural baselines; alert and pause on anomaly.
Anomaly detectors that flag discrepancies between stated intentions and observed behaviour.
Multiple independent monitoring systems that cross-validate agent reports and system logs.
Goal-drift monitoring: compare active objectives against approved baselines before execution.
Source checks integrated with logs: which tools were used, which information retrieved.
Auditing combining human review with automated log analysis.
Adaptive defences using monitoring data to drive rapid responses (patching from log signals).
Storage-efficient logging that preserves critical information.
Regular security assessments including pen-testing and red-team exercises targeting agentic behaviours.

Audience: Vendors and operators

Validate outputs

Outputs are one of the few concrete signals available — make them carry real verification weight.

Validate agent outputs against multiple sources for critical aspects.
Run redundant validator agents that cross-check each other's outputs.
Validate tool responses to prevent malicious or unsafe instructions; standardise tool descriptions to avoid persuasive language.

Audience: Vendors and operators

Human-in-the-loop

HITL is not bureaucracy — it's the brake on irreversible mistakes. The agent should not decide when to use it.

Human-approval requirements set by designers/operators, never delegated to the agent.
Block agents from autonomously executing high-impact actions without prior human approval.
HITL checkpoints for high-cost-of-error actions: system resets, network egress, deletion of critical records.
Quarantine requests to delete logs or audit records until reviewed and approved.
Clearly assign responsibility and accountability for adverse outcomes.
Risk-classify agent actions by impact, likelihood, reversibility — apply safeguards accordingly.

Audience: Vendors and operators

Performance monitoring

Degraded or unusual performance can indicate compromise. Treat performance signals as security signals.

Assess agents' ability to evade security measures, especially in sensitive systems.
Regular assessments of an agent's ability to bypass safeguards (communication barriers, guardrails, monitors, HITL, input filters).
Use evaluation results to validate existing controls and design stronger measures.
Limit resource usage with rate-limits to interrupt long-running tasks and disrupt malicious workflows.

Audience: Vendors and operators

Privileges and authentication

Strict ongoing privilege management is what keeps a buggy agent from becoming a catastrophic agent.

Limit privileges to the minimum required for the task.
Restrict scope of privileges to the narrowest possible level for fine-grained control.
Implement agent reputation and trust scoring; reduce trust on anomalous behaviour.
Just-in-time credentials for high-impact or privileged actions.
Verify API caller identity against user/agent groups.
Authenticate agents with fresh cryptographic proofs before every privileged call.
Cryptographic signing for authorised commands and instructions.
Cryptographic integrity checks for task definitions and constraints.
Cryptographic attestation: agents must prove they are running expected, unmodified code.
Continuously verify identity and authorisation at runtime via a centralised PDP for each request.

Operating model in two sentences

Treat every agent as an unaudited contractor with a stolen badge. Until your monitoring proves otherwise, narrow its scope, expire its credentials, validate its outputs, and require a human signature on anything you can't reverse.

Source: Careful adoption of agentic AI services, co-authored by ASD's ACSC, CISA, NSA, Canadian Cyber Centre, NCSC-NZ, NCSC-UK (pp. 21–23).

Want this turned into your roadmap?

See reference architectures →Book a consult