- Simple AWS
- Posts
- AWS Frontier Agents in Practice: Kiro, DevOps Agent, and Security Agent
AWS Frontier Agents in Practice: Kiro, DevOps Agent, and Security Agent
Frontier Agents are the first AWS-native developer tools that don’t just recommend actions, they can carry them out continuously, at “real operational speed”. This can feel like magic if you've never built an agent before, but if you look under the hood you'll see it’s “just” identity, tooling, and auditability patterns. Even so, the outcomes you can get from these agents are pretty impressive.
I know, these are not the first software agents ever. I mean, we're all using Claude Code at this point, right? Still, I wanted to focus on how agents in general (again, not just AWS's new agents) change how we think and build. The question is no longer “Is this code change correct?”, and instead it becomes “What is the agent allowed to touch, how do we constrain it, and how do we prove what happened afterward?”
This article breaks down AWS’s Frontier Agents (Kiro, AWS DevOps Agent, and AWS Security Agent) as operational primitives: where they run, how they act, what new threat surfaces they create, and which guardrails actually hold up under incident pressure.
And it's my first article of many about re:Invent 2025, because we had a lot of great announcements!
Why “Frontier Agents” matter, and why you should be skeptical
AWS introduced “Frontier Agents” as a category at re:Invent 2025, and launched Kiro Autonomous Agent, AWS DevOps Agent, and AWS Security Agent as early examples. They're positioning them not as “AI in the console” like we saw at re:Invent 2024 with Q, AI in CloudWatch Logs queries, etc, but instead as “delegated execution”:
Agents turn intent into action, like writing code, changing infrastructure, running operational workflows, rather than producing a human-readable checklist.
Agents are stateful over time, especially in always-on operational roles, which changes both reliability expectations and failure blast radius.
Agents widen the boundary of “software supply chain” to include prompts, tool I/O, agent journals/transcripts, and any protocol used to connect tools.
The main problem with these agents is not just that AI hallucinates, that's a well known issue. The problem is that they don't just redefine the good things, but also:
The failure modes are qualitatively different. A wrong suggestion wastes time, a wrong action changes state, possibly in a destructive way.
The audit and accountability model is often under-specified at launch, and operational controls and evidence trails are typically the last pieces that get to a mature state.
Most orgs are not staffed to review agent actions at the speed agents can operate, which usually leads to people not dedicating enough time to reviews. Your guardrails need to be preventative (policy), not purely detective (review after the fact).
My advice: adopt Frontier Agents when you can constrain them like production automation, and wait when you’d be forced to treat them like a superuser with a chat window.
Understanding AWS Frontier Agents
The fastest way to reason about agent risk is to classify each agent by three properties:
Execution venue: IDE sandbox, SaaS control plane, your AWS accounts, your CI runners, etc.
Tool plane: what APIs/commands it can invoke (native tools vs an extensibility protocol like MCP).
Accountability surface: what logs, diffs, and traces exist when something goes wrong.
Let's take a look at each of AWS's Frontier Agents considering these properties.
Kiro Autonomous Agent
Kiro is two things with the same brand name: Kiro IDE is the VS Code-based desktop client you install and use day-to-day, and Kiro Autonomous Agent is the remote agent runtime that does the long-running work. You access the autonomous agent through the IDE’s chat/task UI (and, in some workflows, via repo events like an issue comment/label that triggers work). The IDE is where you give a goal, inspect diffs, approve/stop actions, and see progress. The autonomous agent is where planning/execution happens, in an isolated sandbox that can clone repos, run builds/tests, and produce branches and PRs.
Operationally, you hand Kiro a “task-shaped” problem such as “implement feature X”, “upgrade dependency Y across these services”, or “fix these failing tests”, and the autonomous agent decomposes it into subtasks, pulls the relevant repo context, executes commands and tests in its sandbox, and returns its work as normal engineering artifacts (commits, PRs, explanations, test outputs). Then you review what it produced the same way you’d review a teammate’s PR.
Kiro Autonomous Agent explicitly frames autonomy as happening in an isolated agent sandbox (per task) and discusses sandbox controls like limiting internet access by domain allowlists. That’s good news, but you still need to treat “sandbox” as a security boundary you verify, not a talisman you trust blindly.
Kiro Autonomous Agent IMO seems to fits best in accelerating code changes with a PR-first workflow where you already have review gates, tests, and policy checks (because those become your safety net).
As a side note, I personally think the Kiro IDE is very good in general, especially because it forces you into spec-driven development, where you first design the solution based on specs, and only then let AI build it. I believe this is how everyone should be using AI coding assistants, and I love that Kiro is designed from the ground up to support this workflow.
AWS DevOps Agent
The AWS DevOps Agent is a managed service, configured from the AWS Management Console around a construct called an Agent Space, which is something like the slice of your estate and toolchain that the agent is allowed to reason over. In that Agent Space you wire up the data sources and collaboration surfaces needed for incident response: CloudWatch metrics and logs, other AWS resource context via an IAM role you define, optional third-party observability tools like Datadog, Dynatrace, New Relic, Splunk, etc, and operations tools like PagerDuty, ServiceNow, Slack, Teams (hopefully not Teams!! I hate it), plus CI/CD and source control signals so it can correlate issues with deploys.
You then access the DevOps Agent in two places: the AWS console for setup and governance, and a DevOps Agent “operator” web application for day-to-day incident interactions, where on-call engineers watch investigations, ask questions, and review the agent’s evidence trail.
The idea is that the DevOps Agent gets wired into the same event triggers and comms paths your on-call process already uses (alarms/incidents in, investigation updates out). An alert can trigger an investigation automatically, or you can start one manually in the operator UI, and the agent then pulls metrics, logs, traces and deploy context, builds a working hypothesis, and posts status and findings back into your incident workflow. For example, it cna post Slack updates, ticket notes, a timeline, and a supporting “journal” of what it looked at. The key detail is that you don’t run a CLI to “ask it about production” in the abstract. You give it an incident anchor (alarm, incident, timestamp, scope), it does the correlation work across the integrated systems, and you consume the output in its operator web UI and in your comms or ticketing tools.
A concrete example of how DevOps Agent goes beyond a simple chatbot: It can associate cloud resources with deployments and track deployment details for artifacts like CloudFormation templates, AWS CDK apps, Amazon ECR images, and Terraform configurations. There are also integration points for tracking deployment artifacts in GitHub and GitLab.
DevOps Agent then becomes a consumer of your operational state (deployment metadata, system state), and potentially a driver of operational actions. Meaning it'll be constantly reviewing your infrastructure, and potentially (well, ideally!) making changes to it (which ideally are always correct).
My advice: Treat it like production automation with a continuous control loop.
Note that an AI agent controlling your AWS infrastructure is not something completely new. You've been able to do this since AWS introduced its MCP server with the capability of sending AWS API commands. In fact, my boss uses this to periodically review the company's AWS spend.
You can do this from the web chats like ChatGPT or Claude.ai, or from your terminal using Claude Code, Codex CLI, etc. And if you can do it from your terminal, you can write a script around it and put it in a loop. What's actually new from the AWS DevOps Agent is that it's explicitly designed to be long running, and to not rely on luck and the length of its context window to remember what your cloudy things are and what it's supposed to do with them.
AWS Security Agent
AWS Security Agent is also a managed service you enable and configure from the AWS Management Console, and you use through a dedicated Security Agent web application UI (plus your existing developer workflow tools). Conceptually, you create one or more Agent Spaces that represent an application or security scope, then you attach the agent to the inputs it needs: your security requirements (AWS-managed plus your custom rules), your design artifacts (uploaded docs or referenced materials, depending on what you provide), and your code repositories (commonly via a GitHub integration for PR-aware reviews).
Once set up, Security Agents operates in three primary modes: design reviews let you start a review from a web UI and the agent produces a structured set of findings mapped to requirements, code reviews work as comments on pull requests with issues and remediations based on diffs and context, and penetration tests run an attacker-style workflow against a verified target scope and produce validated findings and a report.
AWS positions AWS Security Agent as a Frontier Agent oriented around building and operating applications “secure from the start,” including across AWS and broader environments. It's supposed to influence design, implementation, and remediation decisions, making your applications secure-by-design (because it's a security “person” participating in the design phase).
Honestly, it sounds pretty great, and if you're a developer who only really does security as a best practice and knows enough to know that you know very little (that'd be me, by the way). However, security-focused agents introduce a subtle risk: they can create false confidence when outputs look authoritative. And I'm not talking about confident hallucinations which can be proven false, though there's that as well. I'm talking about the agent missing things, just like an automated security scan or a human security person might miss things.
Truth is, the only secure software is the one that doesn't exist.
I believe your adoption criteria should include: “Can we validate its findings and actions with independent controls?” For example policy checks, scanners, reproducible queries, and human review where it matters. And overall, treat it as yet another tool that will make your applications more-secure-but-never-100%-secure-by-design.
What “agentic” changes in your AWS threat model
Most teams already model risk around humans and CI/CD. Agents add new principals, new data, and new execution paths simultaneously. And that means new ways in which things can fail.
Agents as Principals
If an agent can run tools, it effectively acts as a principal. Even when a tool call is mediated (PR-based flows, approval prompts), the reality is you are delegating action authority.
Key identity questions you need answered before letting this anywhere near production:
What IAM roles are used (or assumed) for actions, and how tightly can they be scoped?
Are credentials time-bounded and context-bounded (environment, repo, change ticket, incident ID)?
Is there a clean break-glass workflow, and can we prove it was used appropriately?
If you cannot express an agent’s permissions as “least privilege + explicit escalation,” you are not adopting an agent, you are adopting a superuser. I don't think I need to tell you folks this, but I'll say it anyways: superusers are super risky.
New Data Surfaces
Agents don’t just “see code.” They see prompts, intermediate plans, tool outputs, logs, and sometimes incident artifacts. Kiro’s documentation explicitly discusses data protection and privacy considerations (including where data is stored and how it’s protected), which should be your starting point for a formal data classification review.
Examples of sensitive agent-adjacent data you should treat as controlled:
Incident timelines and hypotheses (often more sensitive than the raw alerts).
Stack traces and logs that embed credentials or customer data.
Tool outputs that include resource ARNs, account topology, or network details.
You need to decide whether agent transcripts are retained, where they’re stored, who can access them, and how long they live. Basically, now your chat history becomes an audit record.
New Execution Surfaces
Agents are most dangerous when connected to mutation-capable tools: IaC apply, IAM writes, KMS key policy changes, SG/NACL edits, data deletion, or anything that can modify anyone's access. If you integrate agents through a tool protocol like MCP, you’ve created an execution backplane that must be secured like an internal platform API.
The key takeaway is that the most critical risks introduced by agents risk are not really “bad reasoning” or hallucinations, no matter how much people on LinkedIn complain about that. The new critical risks are about good reasoning + too much authority.
Guardrails That Actually Work
Guardrails should be designed assuming agents will occasionally be wrong, overly confident, or operating with incomplete context. Let's look at a few ways to safeguard yourself and limit what Frontier Agents can do.
Identity Boundaries: Permission boundaries, SCPs, and “break-glass” workflows
Start with a hard rule: agent roles should not be able to grant themselves more power. That means explicit denies around IAM writes, Organizations policy edits, and access-key style credential creation.
A practical pattern is an IAM permission boundary applied to any role the agent can assume (or any role used by automation it triggers). Here's an example of a boundary policy that blocks privilege escalation, limits regions, and requires tagging discipline.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "DenyPrivilegeEscalationPaths",
"Effect": "Deny",
"Action": [
"iam:*",
"organizations:*",
"account:*"
],
"Resource": "*"
},
{
"Sid": "DenyKMSAndLoggingDestruction",
"Effect": "Deny",
"Action": [
"kms:ScheduleKeyDeletion",
"kms:DisableKey",
"kms:PutKeyPolicy",
"cloudtrail:StopLogging",
"cloudtrail:DeleteTrail",
"logs:DeleteLogGroup",
"logs:DeleteLogStream"
],
"Resource": "*"
},
{
"Sid": "DenyOutsideApprovedRegions",
"Effect": "Deny",
"Action": "*",
"Resource": "*",
"Condition": {
"StringNotEquals": {
"aws:RequestedRegion": [
"us-east-1",
"us-west-2"
]
}
}
},
{
"Sid": "AllowOnlyTaggedResourcesForMutation",
"Effect": "Deny",
"Action": [
"ec2:AuthorizeSecurityGroupIngress",
"ec2:AuthorizeSecurityGroupEgress",
"ec2:RevokeSecurityGroupIngress",
"ec2:RevokeSecurityGroupEgress",
"ssm:SendCommand",
"eks:UpdateClusterConfig",
"rds:ModifyDBInstance"
],
"Resource": "*",
"Condition": {
"StringNotEquals": {
"aws:ResourceTag/AgentManaged": "true"
}
}
}
]
}
The idea is to apply this (or something similar) as a permission boundary to any role an agent can use. You can also use Service Control Policies (SCPs) to limit what actions can be taken in an account.
If you require an escalation, it should be done via a ticket or a manual approval step rather than letting the agent self-escalate. Plus, you can create a separate “break-glass” role that is human-only, MFA-required, and heavily alarmed.
Network Boundaries: Sandbox egress control + VPC endpoints
Network control is underrated for agents. If an agent can call arbitrary internet endpoints, it can exfiltrate data or pull untrusted code and execute it.
Kiro’s agent sandbox documentation explicitly discusses controlling internet access by restricting which domains the agent can reach. That’s exactly the right kind of control you need, because it reduces the risk without relying on the model to behave.
For tools on your AWS account, the analogous pattern is:
Prefer private connectivity to AWS APIs using VPC endpoints.
Force outbound through egress controls where you can log and restrict destinations.
Treat “tool servers” (including MCP servers) as Tier 0 assets and isolate them accordingly.
Note: Tier 0 assets are the things that, if breached, can lead to catastrophic consequences such as total system takeover.
Change Boundaries: PR-based workflows and policy-as-code gates
The most reliable human-in-the-loop mechanism is still the one your org already understands: pull requests, code owners, and automated checks.
Kiro’s GitHub flow is naturally PR-oriented (branches, commits, PRs). That means you can apply the same guardrails you apply to humans: required reviews, required checks, and policy-as-code.
This doesn’t “solve agents” or anything like that, but it does solve unsafe mutations (unguarded terraform apply, etc). Agents make unsafe mutations easier to trigger (both because of hallucinations and corruption), so these guardrails become even more important.
Observability Boundaries: Treat agents like production services
Your detection posture should assume:
Agents will run many small actions.
Some actions will fail due to permissions, throttling, missing context, or partial state.
You will need to reconstruct “who did what and why” under time pressure.
If you already use CloudTrail Lake, you can query API activity by role/session to build a defensible audit trail. If your agent platform provides an internal journal/transcript, treat that as an auditable artifact and retain it under the same governance as CI logs and change tickets.
Basically treat an agent's actions as a principal's actions: Least privilege so out of scope things can't break, reviews when something may break, and an audit trail in case for when something does break.
MCP as the Tool Plane
Tool protocols are where agents become real systems. MCP (Model Context Protocol) is one such protocol: a standard way for a model/agent to connect to external tools and data sources through a client-server architecture.
MCP is going to act as the infrastructure to grant agents access to both AWS resources and the broader ecosystem, and that means you should evaluate it like any other integration surface.
How MCP Works
MCP defines how agents and LLMs discover tools, invoke them, and receive structured responses (rather than scraping text). That means if we give an agent access to a powerful tool via MCP, the agent gains power. Which is of course very desirable, we definitely want the agent to have the power to take the actions it needs to take to achieve the outcomes we expect of it. But it's also risky.
Here's a simplified mental model for MCP, leaving out transport details:
+--------------------+ MCP +-------------------+
| Agent / LLM Client | <--------------> | MCP Server |
| (Kiro / custom) | | (tool adapter) |
+--------------------+ +-------------------+
| |
| "call tool: deploy_service" | executes
v v
tool request/response AWS APIs / CI / DB / etc
Tool Design Patterns: Idempotency, dry-runs, and reversible changes
When you expose tools to agents, design them the way you design safe automation:
Idempotency: repeated calls should not create repeated side effects.
Dry-run first: expose a “plan” operation that returns what would change, then require a separate “apply”.
Reversible changes: build “undo” primitives where possible, or at least generate a rollback plan.
Narrow contracts: prefer “rotate credentials for service X” over “run arbitrary CLI”.
Securing MCP Servers
MCP servers become part of your trusted computing base. The minimum control set should include:
Strong authentication: who is the client?
Authorization: which tool calls are allowed for this client/context?
Input/output validation: reject weird or oversized payloads, and enforce schemas
Rate limiting and timeouts: so the agent can’t accidentally DoS your own control plane
Audit logging: requests, responses, correlation IDs, caller identity
MCP’s ecosystem includes reference implementations and server examples (including open-source MCP servers) that are useful for learning, but you still need to harden them before production use.
Conclusion
I believe Frontier Agents sound amazing, so long as you can realistically put in place the necessary controls for them. Don't just grant them broad permissions and hope nothing goes wrong, because, as I've said in the past, hope is not a strategy. If your org is mature enough that you either already have or can put in place the necessary guardrails, Frontier Agents have a very decent chance of helping you reduce toil and making your humans even more amazing. Otherwise, I'd focus on org maturity first.
Good fits:
PR-first engineering cultures where code review and checks already happen for every change.
Well-instrumented services where diagnosis can be automated because signals are reliable.
Typed, repeatable runbooks where the tool plane can be made narrow and safe.
Bad fits:
High blast radius environments with weak change management: agents will amplify the weakness, and sooner rather than later, everything will blow up 🧨.
Orgs without clear ownership boundaries: agents will route around ambiguity in surprising (and very damaging) ways.
Tooling that relies on manual, tacit knowledge: agents will appear helpful while missing the real constraints, and you'll be left figuring out why there's so much output with so little outcomes.
Of course, you're free to try these yourself. I fully encourage you to! Just be very mindful of how you define success. Measure outcomes, not output or vibes. Some example metrics you might want to keep an eye on:
Time-to-diagnosis / time-to-mitigation deltas for incidents the agent touches
Change failure rate and rollback frequency for agent-assisted changes vs baseline
Escalation rate: how often the agent needed human approval to proceed
Audit completeness: can you reconstruct a full timeline from logs + diffs without asking people?
If the agent makes you faster but makes your audit story worse or ends up overcomplicating your system and/or its operation, you’re borrowing time at a high interest rate.
My suggestion is that you start with read-only diagnostics and PR generation (Kiro-style workflows), so you can see how your existing guardrails are enforced. Then you can introduce narrow, reversible tools via MCP, but only after schemas, auth, and logging are in place. From there you can expand to selective “execute” actions where you can prove whether permissions are least-privilege, approvals are enforced, rollback is viable, and audit trails are complete.
And as always, turn your curiosity into ownership.
Did you like this issue? |
Reply