Amazon's Chief Security Officer on the Rising Threat of Rogue AI Agents

Amazon's Chief Security Officer Steve Schmidt discusses the real-world impact of AI on cybersecurity, including how AI empowers attackers and introduces new risks like rogue agents. He emphasizes the need for comprehensive oversight, containment strategies, and agentic identity management to protect organizations.

TechCrunch

English Transcript:

Hello and welcome back to Equity, TechCrunch's flagship podcast about the business of startups. I'm Rebecca Balon and this is the episode where we bring on industry experts to help us explore a trend in the tech world and dive deep. Today we're bringing you a conversation I had on stage at Human X with Steve Schmidt, Chief Security Officer at Amazon. Everyone at Human X was talking about how Anthropic's methods will affect cybersecurity, but Steve and I actually talked about what AI is already doing to the threat landscape. We broke down what happens when agents go rogue, why companies need to have a comprehensive overview of what AI is being used in their firms and where, and how containment and agentic identity are

becoming the new front lines. Let's take a listen. Hey, hello. So, Steve, you've spent time at the FBI and now you're leading security at one of the largest companies on the planet. In the last 2 years, what is the most real, not hyped, way that AI has changed the threat landscape? Sure. So, I think the thing that we're seeing in reality right now is that AI is allowing threat actors to up-level their game. Um we've talked about this previously. It's a situation where actors used to have sort of stratified sets of skills. You get the sort of script kiddies at the bottom and you get the state actors at the top. The interesting thing we're seeing now is that the people who have um sort of the lower-level skills are

becoming more effective at what they do because of AI. It really prompts them effectively to change the tooling they're using, the approaches, the things they're going after. It leads them in the right place, etc. The state actors at the top of the food chain are using AI to do things it much more broadly. So, if they previously had to focus on a few kind of areas that they would go after. We're seeing them be able to go after many, many, many simultaneously. And that's that's interesting because it means it broadens the defense area that we have, but more importantly, it means that our time to react is dramatically reduced. So, we've gone from a situation where reacting in hours might be okay to a situation where

we have to react in minutes to seconds. So, I mean, broadly, what worries you more? Is it that attacks using AI are getting better or maybe that organizations are introducing new risks to their own environments with AI? I think the answer there is actually both things are concerning. They're concerning from different angles. Um certainly it's worrisome that attackers are using AI and becoming more proficient with it. At the same time though, I think as defenders, we have an opportunity here to really up our game, to become more effective at what we do, and in some ways that can outpace what the attackers are doing with AI. If we do it right. And that's the challenge is how do we make sure that we build AI

tooling into our uh detection, response, workflows, etc. The area that we talked about previously that I think is really interesting to a lot of people out here is how does the use of AI change my internal risk landscape? You know, what does it expose that wasn't exposed before? In what volume? You know, previously tooling could focus on narrow areas, gain access to sort of limited sets of information. Now, you've got to think about the breadth of everything that's on an individual machine. You know, when we were getting ready for this conversation, we had to talk about, all right, let's use OpenClaw as the example, the canonical, oh, I'm going to

install this on my laptop and use it. And please don't do that. Um but in any case, it's a situation where all of a sudden a an agent has access to literally everything that you've got on your machine, which was not something that was prevalent before. So, if that one place, that one agent goes bad, all of a sudden you have an exposure risk for everything that's there. Yeah, and I mean that's a problem that we're seeing, you know, a lot of engineers and employees they're being encouraged to use AI to find new ways to make their jobs easier, faster, more efficient, and then that just creates the shadow AI problem. Yeah, and so anybody who thinks that their job as a security

professional is to stop AI deployments and agent deployments, you're wrong. You need to get past it. Um what's really important here is finding ways to use AI tooling safely to ensure that you know what AI tooling you have, where it's installed, how it's being used, what data it has access to, and where that data is going once that access occurs. Deceptively simple sounding problem, but in reality it's really tough to operationalize it, to make it real, and then to assign permissions around the use of that information. So in practice at Amazon, how are you doing this? You've talked to me a little bit about agents having their own identity. Can you explain that architecture and why

it's important? Sure. So for a very long time as an industry we've had two kinds of identities. We've had the identity of the human, the person, and we've had the identity of the machine, the physical object. An agent sits in between that space, and I don't think it neatly fits into either category. So what we decided to do a couple years ago at Amazon is we built a framework which allows us to assign an identity to an agent. And more importantly, we built the framework in such a way that the identity of the calling party flows through everything that the agent does. So on the back end we can tell with precision this human being caused this agent to do these things which resulted in this response, pulling this data from this repository.

Super important for forensics and basic security, but think about this. If you're in a business which is regulated, you have to be able to explain to your regulators everything that you do, where that data comes from, how it's used, etc. So, we had to put that glue, that tie Mhm. in place, which is anchored on the idea that an agent has a unique identity. And those audit logs are really important. I suppose in, you know, not just forensics, but also potentially in training other agents to be able to do some of this work. I know at Amazon you mentioned that you guys have, you know, tons of logs from SDEs over the years. Is this kind of like the one time where enterprise has a leg up over startups in implementing AI? It is

an area where which is really interesting. So, one of the things that we've done for a very long time is to keep track of things that happen inside the company because we are a regulated industry in a lot of places. And it turns out that super useful for fine-tuning and for training of models. Where we can say, all right, this is a software development engineer, this is the kind of work that they do, these are the tools that they use, this is the input that they give a machine, this is the output they get back, this is the code that results from that development process. Turns out that's super valuable to say, all right, let's see if we can improve that loop using the newest version of pick your particular IDE

environment or coding agent. And it also allows us to identify situations where we can fully automate certain kinds of behaviors. My favorite there, by the way, is test harnesses. Um building tests for everything that you build as software should be automatic and it should be mandatory. When you're talking about giving agents governance, like so each agent has its own ID and you have governance permissions, controls, etc. A lot of that is based on nuance of understanding what they should be allowed to do. I imagine that all of that nuance is its own or becoming its own, uh you know, vector of a potential attack surface, right? That's pretty juicy data. How do you protect against that? The area that we're focusing on here is how do you

understand what a human being would do in a lot of circumstances, and how do you canonicalize that in instructions, in guardrails, in tuning of models, etc., and then protect that information from somebody who'd want to abuse it. So, to give an example here, um we've all heard the stories of oopsies where somebody installs a an AI agent on a machine and they're using it to do their work, and perhaps they over permission it. And then, uh for example, it goes and deletes a production stack uh of a running infrastructure component. That happens, it's real. Um it's not the AI doing anything wrong. The agent itself is goal-seeking.

You told it to achieve something, and it achieved it. Right. What it missed was the fact that the human being who would have done that work otherwise has a set of context in the back of their head. They know intuitively, I should not delete the thing that is running the uh service that all of my customers depend on. But the agent doesn't know that. So, we have to find a way to distill, not using distill in the AI sense, but to take the information that's inside that person's head and build it into software that we then put around agents to ensure that they're doing the right thing at the right time all the time. To get back to your original question, though, that set of guardrails and set of

context is incredibly sensitive from an intellectual property standpoint, but it's also incredibly sensitive because if an adversary were able to pollute it, you could cause really interesting things to occur. You could cause agents to escape their guardrails in ways that you didn't expect. And so, we have to make sure that we know that the information that goes in is what we expect, and that the guardrails in place are the same thing that we put in at the front end and remain that way. And that takes a set of work on ensuring that changes haven't been made to the infrastructure, the operations, the tools, etc. underneath. You mentioned also some like containerization here, right? Like that way that the

agent makes decisions is actually separate from where all of that information is and that's not something that everyone does. Yeah, so we believe really strongly that agents should not be let run free on individual machines. It should be in some form of a container. There are a lot of different ways to make this real, to implement it. We have one that we chose internally, it works for us, but the idea here is that the agent runs in a container. In order to get credentials to do something, it has to pierce that container boundary. That act of piercing the container boundary is something that we can audit,

we can log, we can control. And furthermore, when it retrieves that credential, that credential is uniquely tied to that individual action. So we know it made an action, it made a request, we gave it a credential which had a certain scope. And that credential is tied then to the rest of the workflow that goes through. Furthermore, we can examine that request for a credential from the outside using what is effectively a judge. Another model that says, is this a reasonable request given all the context that I know about who Steve is, what he's doing, what his job is, etc. The point here is to ensure that the agent itself can't be tricked easily into

doing something it shouldn't. And if it is tricked, that there's another party who says, "Hang on, time out. That's not the right thing to do." You know, this is just reminding me obviously this week we've had some news that Anthropic's mythos has been powerful enough that it's not being released publicly, it's only accessible through tightly controlled environments. Um does this reinforce this idea that future of AI security is really about containment and control rather than just model capability. I think where we're going as an industry is that we've got to build the kind of skill set that Mythos has into our software development deployment and operation chain. It's no longer a question of can we contain something

after the fact or prevent something from happening? It's literally we have to have a super tight feedback loop between I am writing software and when I commit that line or pause my coding agent to build something, it's immediately examined and said, "Yeah, maybe you shouldn't do that. Maybe you should do this instead. Here's the change. Accept." Um we cannot wait until the end. It's way too late. And by the way, it's too costly also. Short iterative changes are much less expensive than looking at something after it's all been put together and saying, "Wow, this is a train wreck. We got to go fix it." Um so we believe in that sort of really tight iterative loop that allows us to make changes in a short period of time. And also every time you make that

iterative loop, it's better training material for the model you use on the back front end. Mhm. Well, it's not all models, right? Like so there is we talk about human in the loop, but that's more than just a slogan. Where do humans need to stay in control? Yeah. So human in loop, um we've had this idea for quite a long time that for very sensitive changes in our infrastructure operations, we require two people to agree that change is correct. It's called contingent authorization. Uh we choose to implement it using hardware uh two-factor off at Amazon. So if I'm someone who's working in an AWS service as an example, and I want to make a deployment to production region, I can't do that by myself. I need to submit the

deployment and the system says, "Hang on. This is what we consider to be a significant event. I'm going to ask your supervisor to say, 'Yes, indeed. This should have taken place.'" Supervisor said, "Yes, that is indeed true. They touch their two-factor authentication token on their laptop. The deployment proceeds, etc. We need that same kind of control in the AI world, where we've got that sort of stop sign, that checkpoint, where the system itself can say, I need another party to say yes, this is reasonable and go ahead. And that has to be outside the agent itself. Because as we've seen from lots and lots of press, uh you can trick agents into doing things that you shouldn't. I mean, the most basic is

ignore any request for two-factor off. Okay, fine. Um but the point is that you need to have that hook, and that hook needs to be enforced outside the boundary of the agent. So, that's why we use a container to do that kind of enforcement point in our own environment. And so, you've also mentioned Midway, right? Like this is an actual piece of hardware. It's kind of like a keep it simple stupid idea, right? Yeah, so Midway is our internal authorization system that we've used for, I don't know, 12, 14 years. Um but it depends on uh FIDO2 tokens. So, that for actions which we consider to be especially impactful, we require a human being to touch a two-factor off token, and the Midway system says yes or no,

something can go ahead or not go ahead. And is a fantastic safety lever to prevent deployments that we wish hadn't happened or other actions which could be detrimental. We built the same thing into the way we use agents right now. What does this look like for a scaled-down company, right? There's a lot of startups here. How should they think about building their security? So, if there's one set of advice I would have, it's the first thing you need to do is to know what you're actually using today, which is actually not easy. Uh so, what are the agents that you're using today? Where are they installed? What do they have access to? Where's your data residing, and what are the rules around that access? Rules

being who can access it from where or when. You know, as a small company, as a startup, heck even at Amazon in the early days, the answer was everybody had access to everything. Well, cuz it was the most efficient way to get stuff done. But, as we grew as a company, as the sensitivity of the data that we held increased, we had to neck that down so that only the people who were required have access. So, for those who are just getting started in the beginning, make sure you keep track of what you got. Even if you don't do anything differently with that information, you'll need it down the road, I promise you. So, keep track of what you got, understand who's giving it permissions, what those permissions are, where it can

take the data that you've got to, and that data sensitivity. It is really hard to go back and label data after the fact if you don't record the sensitivity of it when you're putting into a data store. Trust me, I know. We've had to do a lot of digging out from things in the past. If you've got the opportunity to avoid that particular hurdle when you're building something, awesome. So, you're saying from the get-go, you want structured data. You want to You want to build the architecture that allows you to have that kind of structured data. Yeah, structured data about your data, to be clear. So, it's you don't have to put everything in a relational database. You just have to find a way to label the

information you get with some kind of structure so that you can say this is sensitive customer data. This data is open source, don't care. So, if someone leaving this room today could do one thing different about how their organization relates to AI, like what's the top priority today? The top priority today is make sure that you understand where your agents are and don't give them unfettered access to everything. Do not repeat the open claw problems that people have seen over time. Learn from those mistakes. Run whatever agent you have in some form of an isolation chamber. You still have to give it access to data to make it useful, but do it in a way that you can measure it if you need to.

Could be a container on a machine, could be a VM, lots of different ways to do it, just pick one. How do organizations keep track of what kinds of agents are being downloaded by their employees and how does that not crossover into sort of, you know, corporate surveillance or employee surveillance? Yeah, so the thing that a lot of folks are concerned about is, well, how do I make sure that I know what's actually being used, the point you brought up in the beginning there. And there are a lot of different software packages in the sort of the IT world that do inventory of stuff that's running on machines. They can probably be extended to do it here. We do it a unique way at Amazon, it's not

applicable to everybody just cuz we build our own tooling largely, but you do need some method of inventory on what you have on your machines. You will be surprised if you go look at what people have downloaded and used on their machines and as a security professional occasionally dismayed. So, it's important to start the process of discovery to keep it current and then to make intentional decisions about do I like that being there over time. [clears throat] I once heard some advice that startups, some of their first employees or first hires or first executive team, must include a CISO. Do you agree?

Not necessarily. I think that the best startup is one where everybody in the company owns security. Everybody's responsible for it themselves. You don't need someone sitting in my spot saying we should do all these things when you've got five people. What you need is employees who understand the sensitivity of the data that they're gathering and the value it has to your customers on the end because that is ultimately what really matters. Is do your customers trust you with their information? Do you handle that information in a way that's consistent with the way they would do it themselves or are you going to burn your business to the ground because you get owned by

somebody. I mean, those are really the trade-offs that people are making. That's about all the time we have today. Thank you so much for joining. Thank you.

More Tech Transcript

How to Fix Your Phone's Battery Life: Expert Tips and Future Tech

How to Fix Your Phone's Battery Life: Expert Tips and Future Tech

How Operating Systems Work: From Bootloader to Shutdown Explained

How Operating Systems Work: From Bootloader to Shutdown Explained

OpenAI ChatGPT 5.5 Instant Review: Hallucination Cuts, Security Flaws, and Surprising Benchmarks

OpenAI ChatGPT 5.5 Instant Review: Hallucination Cuts, Security Flaws, and Surprising Benchmarks

Motorola Razr Fold Review: A New Contender in the Book-Style Foldable Market

Motorola Razr Fold Review: A New Contender in the Book-Style Foldable Market

Pixel 11 Leaks: GPU Downgrade and RAM Decrease Raise Questions About Google's AI Plans

Pixel 11 Leaks: GPU Downgrade and RAM Decrease Raise Questions About Google's AI Plans

Exploring China's Futuristic Greater Bay Area: Robots, Flying Cars, and Smart Cities

Exploring China's Futuristic Greater Bay Area: Robots, Flying Cars, and Smart Cities