[00:00:07] Speaker A: Welcome to the Cloud pod where the forecast is always cloudy. We talk weekly about all things aws, GCP and Azure.
[00:00:14] Speaker B: We are your hosts, Justin, Jonathan, Ryan and Matthew.
[00:00:18] Speaker A: Episode 344Amazon's coding bot Bites the Hand that Runs It Hey Matt, quite in here today.
[00:00:25] Speaker B: Hey Jonathan, how are you?
[00:00:28] Speaker A: I'm good, good. Long week of project managing and various other things. How about you?
[00:00:35] Speaker B: Long week just in general, life, work, personal, family, you know, everything all at once this week. So doing some work, travel, doing some personal travel, all merged together makes a very long week. But you got two of four of us, so here's our podcast.
[00:00:52] Speaker A: Yep, not quite a quorum, but quorum enough.
[00:00:57] Speaker B: It's better than one of us.
[00:00:59] Speaker A: It is.
All right, so first we got general news. Cloudflare's code mode MCP server reduces token consumption by 99.9% if you have a bunch of tools, I guess compared to a traditional MCP implementation. By exposing the entire Cloudflare API, which would be two and a half thousand endpoints through only two tools, search and execute. I think this is kind of like what Anthropic did with, with Claude. And instead of putting all the tools into a single context, they, they create skills. And so you have a search for a skill tool and then the skill drops just important pieces into the, into the chat. So it's good. I mean, not sure I could imagine two and a half thousand MCP tool definitions in a in a context window and still actually use it for anything.
So the architecture works by having the AI agent write JavaScript code against the typed OpenAI spec, rather than loading all the definitions into context and the code executes inside a sandboxed worker that restricts file system access, environment variables and external fetches by default.
[00:02:04] Speaker B: I mean, they would have had to do something because like you said, the context window were for 2,500 endpoints just isn't logical.
So but the ability to do it in a thousand tokens and get that type of data in there is actually really impressive. So, you know, being able to achieve this hopefully sets starts to say a better model than having like 15 MCP servers, all for, you know, one company for one product. You know, if they can get into one, it then becomes, you know, here's your API endpoint, here's your MCP, here's your 8. Like you get back to a better state of being able to work with something than having to know how to set up all these things in order to get everything working for you.
[00:02:45] Speaker A: Yeah, I kind of question why they would Put it all into a single MCP service in the first place. I mean they have such a diverse range of products and services. You would think that they would just break it up into some more simple.
[00:02:59] Speaker B: But then it's like what MCP server are you hitting? Are you hitting their.
Great, now I'm going to try to make something up on the fly. Do you hit their worker nodes, you know, and their, their edge compute that they have or you hitting their general like set me up Cloudflare or you know, and you, you start to have too many MCPs at that point you don't know where to go with anything. So having it at one place, it's more centralized. But you know, you could also break more things.
[00:03:25] Speaker A: Yeah, I mean it seems like a good candidate for infrastructure as code. I mean, I guess this is kind of like getting at. They're writing JavaScript against the spec which then runs. It's. It's like having infrastructure as co except it's fed in through an agent instead.
Well, unlike Cloudflare, OpenClaw is not known for its security.
But OpenClaw creator Peter Steinberger has joined OpenAI. He's a creator of the viral AI assistant OpenClaw, formerly Claudebot Maltbot and has joined OpenAI to lead development of a next generation personal agents.
[00:04:02] Speaker B: This is kind of where I see Anthropic cowork kind of too slowly going to being your personal assistant and whatnot and having this be your ability to kind of manage real world tasks is great.
And if they can build that into OpenAI then it becomes a lot more of a personal assistant than just a general tool that you're using for what we all use, you know, Claude and you know, OpenAI for.
So I get where they're going with it. They're trying to kind of build out that different product segments like Anthropic is going.
Curious to see if this is a good way to go on this.
[00:04:45] Speaker A: It's. I find it strange because OpenAI could never released a tool like that.
They could not have released such an immature, unsecure product out to the market. But yet here they are hiring the guy who did this. I mean it's great press for them. Although it was named after Claude in the first place, I think he was shunned a little bit by the fact that Anthropic told him that he had to rename it.
Yeah, but, but they could be building
[00:05:12] Speaker B: out like what I like that Anthropic is doing is having that true R D wing where they're playing with like the Chrome plugin and the cowork and all, all these other things, the we'll talk about later, the security, npr, which, you know, if I was pre planned this, I would nicely segue into those. But we'll talk about those in 30 seconds. You know, so they're, they're kind of working on the different.
How do you leverage the generalness of this into different product segments?
And that's not something really that OpenAI I feel is doing yet.
[00:05:47] Speaker A: I mean everybody wants Jarvis at home.
Yeah, everybody, everybody wants, you know, Demolition man where the guy walked in, was it early 90s? Walked into the room is like illuminate and the lights come on the house. Everyone wants that kind of home automation, but everyone also wants the privacy that goes with it. So I don't think we're going to see that at least with these hosted AI tools. I think they need to be local things, but at least for me to be comfortable with that kind of thing.
But I don't know, coworks, it's not the same kind of interface. I mean, I like the flexibility. I like the idea that I can chat with it in Slack or Discord or.
I think there's a lot of, there's a lot of fake news, a lot of stuff that was made up around openclaw. People wanted to sort of.
[00:06:30] Speaker B: It was a hot thing.
[00:06:32] Speaker A: Yeah, yeah, it was. I guess it happens on the Internet, but it's definitely pushing people's perspectives on what AI could, could actually do for us as a, as a civilization or on a personal basis rather than just being. You can organize my Excel files, you know.
[00:06:51] Speaker B: But like I was talking with a friend and he was saying, yeah, they have open cloud at work. And they, he was like, somebody literally pinged it to add somebody to a channel. And it's like you could have just so at the same time to tell the AI assistant to do that, to add them to a Slack channel to just do it yourself.
Like that's not a hard task to do.
[00:07:17] Speaker A: So yeah, I think it's, it's gonna make people lazy. You know, I see people typing, edit this, you know, comment this out in this firewall. You know, that's just a mouse click and a key press and it's done yourself. But now you have this. I think people like the idea that they can delegate work to somebody else, even if it's a machine.
[00:07:36] Speaker B: I oversee all the machines, do what I say. Adult think.
No, I, I use AI a lot for thinking. For me, I'm like, go do an analysis between these six things and tell me it, build me charts and give me a cross comparison of it or, you know, how do I handle migrations from, you know, A to B inside of Azure when it's supposed to be supported but it's not, you know, because it's Azure.
[00:08:03] Speaker A: Yeah, I think they still lack, they lack sort of the human context. I know they're trained on, you know, almost the entirety of human knowledge that's documented and scanned, but they kind of lack the, they lack some really common sense. Sometimes you can give it a goal and it will work to achieve the goal, but it doesn't the way it may achieve the goal. It's like a genie, you know, you wish for something and you get it, but not quite what you intended.
So I think, I think I want a little more alignment to the way people do things in these tools rather than just achieve this goal for me
[00:08:43] Speaker B: anyway, so in ways that Anthropic is playing with new features. Anthropic has released CLAUDE Code Security and Preview for enterprises and teams.
The tool is a multi stage verification process where CLAUDE reexamines its own findings, filters out false positive, assigned severity ratings and confidence scores.
Additionally, they've also released the CLAUDE code on desktop which is a fully automated loop for live PRs, inline code reviews and previews of changes.
So in this case you can really see Anthropic is trying to attack kind of that developer life cycle, you know, here, go write me something. Go do a PR before I go do a pr, have a security senior dev. This is kind of like my workflow that I have a lot too, which is, hey, go do a thing and then pass it through to two or three agents, you know, depending on the severity of what I'm doing. But like, you know, pass it through to a senior developer, then to a security than to, you know, whatever else and, and kind of review it and they're kind of taking that whole loop and building it out for me, which I'm. I think this is a great tool because I can, you know, shorten my agents MD or cloud MD files down or whatnot, which will be nice.
[00:10:04] Speaker A: Yeah, I watch. I don't know how many people watch CLAUDE Code do its work, but I tend to, I tend to sit and watch its thought process as much as I can.
I just find it interesting. There's definitely been a shift in the way that it works or maybe it's just a shift in what's being presented on the screen, I'm not entirely sure, but there's definitely been Times working on complex projects where it's gone down a path and it's sort of had this realization that actually no, it made a mistake and it goes back and it redoes it or it considers something or it must bring context in from somewhere else and it reconsiders itself or it double checks itself. It's kind of strange to say a machine saying hang on, let me check that. Then it's like oh no, it's okay, I'll move on now.
It's a very human way of going back and self reflecting on the work you just done. But the security tool has been quite devastating for the security industry. Stock price in general, I think it's just going to keep happening. Software is down in general, security is down now. I think every, every corner of the market now where they build some tooling is going to, is going to have more and more of an impact on the economy.
[00:11:19] Speaker B: What was it? IBM?
I don't know if this is before or after because we're recording it day late this week but there was, there was an article saying that IBM stock went down after. Was it Claude or OpenAI announced like massiveness in Cobalt which then I'm confused why IBM is spending so much, has so much market share associated with Cobalt but maybe there's a piece on that like mainframes, I'm not really sure.
[00:11:46] Speaker A: Yeah, a lot of, lot of legacy software so running on Cobalt on mainframes. IBM is really the only supplier of those of that hardware anymore and they hold the expertise in that and it's a cool, it's not a complex language, it's just not well known anymore. So now you can, now you can hire somebody to write COBOL or fix COBOL or come up with a migration plan. I think there's a lot of businesses, even companies I've worked at in the past 15 years who still have COBOL running on these huge monoliths.
And the migration path away from that is just horrendous. And so using AI now to figure out migration paths or re implement exactly the same functionality, which I think when it comes to financial services and banking is the big concern concern. How can you be sure that this new thing you write will do exactly the same thing as cobol including you know, all the bugs that turned into features and things like that. So yeah, it's good. I mean who wants to use COBOL anymore? For a start it's not the best. And who wants to spend money on mainframes when there's far better compute now it's the people who are trapped in those situations with impossible migration plans.
I wouldn't be surprised if the government still has, you know, the US government still has stuff running in cobol, huge batch jobs that do things, taxes.
[00:13:14] Speaker B: What was it during when Covid first started? I think it was like, was it like Utah or Nevada? Like they had a massive outage in the root cause of their unemployment system. It was because their Cobalt based system, backend system wasn't able to handle the scale of it.
You know, like so, you know, this stuff's running everywhere. It's just a matter of can you migrate? You know, talking with people about upgrading Python versions becomes a big deal. But like now you're switching languages and the level of effort and the testing and the regression testing and the validation, you know, especially in the financial services, government or anything else, that it's just such a high level of barrier, level of, what's the phrase? Barrier to entry, and a level of effort that people aren't willing to take it on. But if we can leverage a system like this to do it, and then all you have to do is the testing afterwards and maybe you automate your testing too. With it before you move, go write a plugin for Cobalt that checks and watches all API calls or whatnot between the different parts of the system and then can monitor on the other side here and make sure it works the same.
You could be able to build something that works effectively and have a high level of confidence it's going to work the same way.
[00:14:34] Speaker A: Yeah, yeah. I think there's a lot of tech debt, which isn't difficult tech debt. It's just hard to prioritize because there are features, features, drive sales, other things, drives, performance, drive sales. And so I think a lot of tech debt just gets left to rot even longer than it probably should. And I think maybe this is the best use for AI, is that it can come along and it can build something, it could build something offline, it can build the test harness, it can build ways of running software alongside a running system to model what goes in and out and sort of keep refining itself over a period of weeks or months until you're confident that it can take over. And then you flip a switch and all of a sudden you've done the migration and it didn't, didn't require an outage, it didn't require downtime and sort of proven itself over a length of time. I think it's something that people do themselves manually, but I think having an AI that could just orchestrate the entire Thing would be great. I haven't used have I used clock code on desktop. I have used clock head on desktop. They added clock codings there. I prefer the web interface for it personally or just. Or just the cli. I didn't find the desktop to be great because it doesn't seem to persist the sessions. So you can, you can have it work on stuff but then it doesn't seem to have like a history of. At least on Windows anyway. A history of what you did.
[00:15:55] Speaker B: I use Claude Code desktop for just general things, you know, like just talking with the editing documents, things along those lines. I. Sorry, just straight Claude. It does have a cloud code tab on a Mac but I've never used it because at that point if I'm going to go code, I'm in VS code. I haven't used the web. I did see something which we'll probably talk about next week where there's like they have the mobile app that can now link to Claude code which also just sounds like a security nightmare. But we'll talk about that next week.
[00:16:27] Speaker A: Yeah, I'm excited to test that because it's like it's a work life balance nightmare. But it's such a great feature.
[00:16:35] Speaker B: Yeah.
[00:16:36] Speaker A: Because I want to start something going.
You do a plan. You install the plugin so that it presses yes, allow this action.
But you walk away and you're never quite sure if it's going to stop and ask you for something. Something new, kind of pointless. Yeah, like a shadow stopper. Like hey, are you sure I can CD into this random directory? I'm like, yeah, go ahead and do it. I think they're that agent permissions thing needs a bit of love.
But having, you know, having a remote control over your active Claude code session from your phone, wherever you are, it's great. I mean that could be out with a family, get distracted by the phone, answer some questions. You know, not. Not good work life balance but. But a great feature for sure.
[00:17:16] Speaker B: All right, on to databricks. Databricks announced the general ability of zero bus ingest part of the lakeflow Connect, a serverless streaming service that pushes data directly into delta tables without immediate message buses like Kafka supports thousands of concurrent connections achieving over 10 gigabytes per second aggregate throughputs with data landing under 5 seconds. The core architectural differences is a single sync design versus Kafka multi sync approach reducing traditional five system streaming stacks down to two components, eliminating the dedicated compute and storage meets.
Hmm.
[00:18:03] Speaker A: I. I'm kind of curious. I should have read this and Figured out how it worked. It sounds, it sounds amazing, but.
Okay, so what kind of. I guess what, what kind of, what kind of use cases?
[00:18:14] Speaker B: I guess I think it's just supposed to take. I mean, I think it's Kafka, but without the overhead and everything. They've built their own internal system that can handle the same data flow without having to you to set up. Because what you used to traditionally have to do, at least the way I've done it, is you would dump your, your data into Kafka, onto the stream and then it would process into databricks at that point and into your data lake. Here is just. But you had to manage Kafka at that point. So it's just taking a piece of toil out of the middle of it.
[00:18:46] Speaker A: Okay, so it, so it kind of eliminated the, the array of brokers that you have to have in the middle is what I guess is what they're referring to by the multi sync design. You have to send it to a specific broker after you find out which one's responsible for the stream.
Okay.
[00:19:01] Speaker B: So they just, they got rid of a good chunk of toil in the process.
[00:19:07] Speaker A: Not a fan of Kafka in general, but I am a fan of doing things at massive scale. So this is kind of cool.
[00:19:14] Speaker B: I feel like all of us are not a fan of overly complex tools like simplicity, but like to be able to scale stuff a lot because we all kind of have a love hate relationship with Kafka Kubernetes, you know, all these really complicated tools because we just don't think most people need them. Or I'll speak for all of us right now, but like we all still know that we need them when we use them when we have to. We might begrudge using them, but we're still using them.
[00:19:45] Speaker A: Yeah, that's a very fair assessment. I think the problem is the tools grow features to support many, many customers use cases.
And perhaps they should be three different tools or three different flavors of the same tool, which is a simple tool rather than one massive complex thing which is very hard to get a grasp on.
All right, well, OpenAI appears to be preparing ChatGPT Pro Lite tier at only $100 a month, which sits between the existing plus plan which is 20, and the full pro plan which is 200. Anthropic have had their max. Max 10.
[00:20:23] Speaker B: Yeah, I'm on the max. I'm on the $100 plan.
[00:20:25] Speaker A: Yeah, so Anthrop had that for a while. I think there was a call for chat GPT or OpenAI at least to do the same thing because there's a huge gap between $20 and $200 and managing multiple accounts is a bit of a pain.
So that's, that's kind of nice.
[00:20:44] Speaker B: I just think they needed a different naming convention. They had plus they have Pro.
Why is it Pro Light? I understand this isn't the point of the podcast, but like, you could have named it something different. I don't know what to name. I'm not a naming person. I name things X or Y. But like you got plus you got Pro.
Why Pro Light? Is it like, to make you feel like you're a professional, but you're not all the way there yet?
[00:21:14] Speaker A: I mean they, they would. The other option was Pro cheapskate, but they just went with life instead?
I don't know.
[00:21:21] Speaker B: I, I, I look at Take Pro Exec.
[00:21:23] Speaker A: Yeah.
I look at my, my, the usage I get out of my, my max plan with Anthropic and it's just phenomenal. I, I just constantly churning through coding tasks and various other things and I think even, even $200 and you know, slightly privileged to be able to afford something like that. But the value I get out of it is enormous.
Probably, probably in the thousands of dollars worth of labor savings, if you like, if I was to delegate the workout or spend time on it myself.
[00:21:53] Speaker B: Onto cloud tools, Packer adds SBOM vulnerability scanning.
And as somebody that manages an application, really, SBOMs are really big right now and very important in life. So if you don't know what an SBOM is, it's the software bill of materials. So Packer now supports SBOM vulnerability scanning in public beta, allowing platform teams to scan and get their SBOM against MITRE CVE databases, classifying findings and severity directly with the artifact registry. So basically, as you build your artifact, it will then scan it and immediately put in the registry that you have a critical. And you really should probably fix that before you sell it or deploy it out or nicely ignore it.
The reason why this is so important is it addresses the supply chain security gap, servicing vulnerability data at the image level, covering AMI's Docker containers and virtual machines before they reach production environments.
[00:22:56] Speaker A: I guess it's, it's only as current as the time you built it though.
[00:23:00] Speaker B: That's always the problem.
[00:23:02] Speaker A: Yeah, it is.
Yeah. Makes me think actually a little agent that runs inside.
Oh, I mean, there are tools out there. Yeah. You say there's tons of tools out there that, yeah.
[00:23:16] Speaker B: Palo Alto Carbon CrowdStrike will scan your images. Carbon Apollo Also has multiple tools for per container. You know, with either your sidecar or load the container. In theory, you would then do the rescan because even like aws, ECR has this where it scans your containers every day and tells you if there's vulnerabilities in it.
So while you are correct, it's only. It's. This is only as good as you build it because Packer is a building tool. But then if you put this into the next system, then that can give you more real time analysis also. But it's better to get it before it gets into that system, which I think is the point of this.
[00:23:59] Speaker A: Yeah, I mean it's shift left, I guess even better if it. If it didn't have to build it and it just told you ahead of time. Although I suppose if you don't specify it by versions of packages and things, it pulls the latest or it pulls whatever's available in the.
In artifactory or wherever you're staging copies of packages. Yeah.
All right. Well, Kubernetes 1.35 brings two new auto scaling milestones. We have in place pud resize, which is graduated to generally available and vertical pod autoscalers in place or Recreate Update mode, which allows vertical pod auto scaling to adjust CPU memory on running pods without affecting them.
That sounds kind of nice.
The practical benefit for stateful workloads is fairly substantial if you're one of those crazy people who like to run databases or SQL Server on Kubernetes because previously those pods will be evicted and new resources requested, which would obviously cause disruption, stale caches and other issues.
[00:25:05] Speaker B: I both really like this tool and really hate this tool. I'm not gonna lie. Like, I like the idea. I like the flexibility, you know, hey, versus going redeploying all my nginx containers, just add 200 megabytes of memory, add another CPU so I don't have to redeploy it all and you know, have to watch if I have websockets or something like that, watch the connections kill off and restart. But at the same point it's giving you the ability to not use containers properly and making a container be a vm.
And I just dislike that ability.
[00:25:43] Speaker A: It's a very blurry line. I appreciate that. No, I don't know. I think I'm surprised this feature didn't exist before because on a host where there's free resources, where there's free CPU memory, do you really want to put that limit on?
Is it really important that you requested, you know, two whole cores and 128 meg of memory or however much you're assigning. Is it really important to enforce that? If the host has got spare capacity,
[00:26:12] Speaker B: I guess I would say CPU less than memory because memory you, if it can over, you know, memory, you can crash the host a little bit more than CPU just makes it slow down.
[00:26:24] Speaker A: Yeah, I mean memory, memory is. You can't share memory with somebody. If you're using the memory, you can't share it. At least CPU you can time slice and everything just gets a little slower.
But at the same time, I mean the Linux hosts that run Kubernetes, they have a huge amount of disk cache.
[00:26:43] Speaker B: And so that's if you're leveraging like a swap or something like that or are you thinking like physical hard drive caches?
[00:26:50] Speaker A: Yeah, just, just the, just the file system has, will use as much memory as it can in Linux it doesn't just sit there empty, it uses, uses it while it can, then it makes it later.
So I don't know it was always.
I'm just surprised the feature didn't exist. It seems like an obvious one. So I assume there was some good reason why it didn't exist.
I guess if you've got the spec for a pod and the pod says this is what it should be, all of a sudden you've now got to say well I've scaled this one so this one's okay at the bigger size but still bring up the new ones at the new size. I don't know, it's all a little strange.
[00:27:27] Speaker B: I'm now trying to picture like my Argo CD deployment where I've like told to scale stuff up but one pod it didn't scale up so it's going to stay there like who's watching? But like I just the thought process rather this like always goes in and changes it and ARGOC then goes deploys it and I'm like but what's it deploying it on is you gotta scale it back down. Like it's scale up features but like so then it can't scale down problems. Are you gonna end up with like orphaned containers? Like I get why it took a while for them to build this feature.
[00:28:03] Speaker A: Yeah, I mean you could always scale CPU down, that's easy to change but you can't scale memory down once it's
[00:28:08] Speaker B: been assigned memory and a hard drive space. Two things you can't scale down easily.
[00:28:13] Speaker A: Yeah, I, you know, I guess I do agree with you though that it does kind of start blurring the lines between Maybe this should be a VM with some memory contention or CPU contention on a large hypervisor at this point, rather than a part in Kubernetes. But it's like Kubernetes came along and solved the orchestration for containers problem and now he's sort of forgotten why it was created in the first place and now it's kind of become the de facto standard. This is how you deploy things.
And then we realized that actually, no, that model doesn't work for a lot of workloads. And so now we're sort of retrofitting these things to make it much more VM like than it was ever really made for. I don't know. I guess when I see a pod being live migrated to another host, I'll agree with you a little more there.
[00:29:00] Speaker B: Please don't. But were we on. We're on 35 now, so probably about. Maybe that'll be like a 2.0 feature.
Like that'll be like a require a massive architectural change.
[00:29:12] Speaker A: That'd be kind of a fun project to try to try and figure out, actually. Yeah, if you can do live kernel patching, you can live migrate a part to another host.
Okay, well, Amazon onto aws.
Amazon. An Amazon service was taken down by AI coding bot, apparently. Although Amazon are saying it wasn't the bot itself, it was, it was a person. Just like guns don't kill people, people kill people. Amazon's Kiro AI coding bot caused a 13 hour outage in the Cost Explanation service in December after engineers granted it two broad permissions and it autonomously decided to delete and recreate the environment rather than patch it.
Who knows, maybe, maybe, maybe we don't know what it was doing. Maybe you thought that patching it was a higher risk.
There may be some logic behind that. It's hard to tell from the outside.
A second outage involves Amazon Q developer, though Amazon says neither event impacted core customer facing AWS services.
Amazon's position is that both incidents were user error stemming from improper access controls, not failures of the tools themselves.
I kind of question that.
What's the point in having these tools if you can't trust them to do stuff?
[00:30:25] Speaker B: I mean, you could say that about most things though. Why have a car if you don't trust, you know, the car? You trust the driver of the car?
[00:30:35] Speaker A: Maybe not entirely convinced.
[00:30:38] Speaker B: I mean, I think a lot of it comes down to, you know, the permissions and the users watching it. You know, a lot of the developers that I know just we were talking about it before.
I Do it on more of my personal projects. Yeah, yeah, just do whatever you want. I don't really care. You know, I had to build me a PowerPoint and there was playing with the Cloud, Cloud PowerPoint feature and there's one that's just like do what you want. It's essentially the equivalent of what is it like the dangerously you launch cloud with like dash, dash allow dangerous. And essentially it was the version of that. But like if you're letting the AI tool start to do things inside of production environments, that's where you need to watch it and you need to probably have it be a little bit more specific. So the human needs to kind of be watching what's going on and peer reviewing it, not just saying, yeah, yeah, it's not like Matt building a PowerPoint where I'm like, here's my template, go do what you want, here's the content I want, go actually go do things with it.
[00:31:39] Speaker A: Maybe I'm drawing a distinction though between the model itself and the agent. The model. The model is. The model is just the model. The model is just the train set of weights. The agent is an agent that they've created to perform a specific task.
And if they didn't give it the right information, if they didn't give it the situational awareness so that it understood where it was working or what the consequences would be for performing certain actions, then sure, it's the person who, who wrote the agent prompts. But I was kind of lumped that into. It was the problem with the agent. The agent should have been more aware about the situation in the environment and what was permitted and what was not permitted, whether or not it had the actual permissions to go and do something.
[00:32:24] Speaker B: I mean, the piece of this puzzle that I also think is interesting is that in response to the incident, AWS added safeguards including mandatory peer review for production access, which meant there wasn't peer review. You could just go into production and then I have questions of SOC and ISO and all the other standards. Like I guess certain people just had access, which is fine, not really, but we'll go with that.
Or anyone could just allow anyone access to production. So you know, Jonathan would just put a request in and I can self approve it.
So I guess that's where I'm like trying to understand like they're compliance controls. But that's my compliance and security hat I'm wearing a little bit there.
[00:33:10] Speaker A: Yeah, maybe I'd like to think that they have different types of production environment, you know, environments where there's customer data and environments which are just internal services that process like like this service was cost information for example. But yeah, very very very poor show.
So the Financial Times report blamed the AI coding tool for the for the outage and Amazon issued a rebuttal the Financial Times they confirmed that the disruption only affect cost Explorer in a single region in China for roughly 13 hours and that the issue is user error. So I don't know.
Poor show.
[00:33:49] Speaker B: So onto ways to actually limit your permissions in the ability to do things AWS IAM policy Autopilot is now available as a KIRO power and all I can think of is our power. Oh that's what we should have done. We should have done a Captain Planet Power Hero Power Thief. Okay, it could have been.
[00:34:11] Speaker A: It could have been he man. I guess it could have been by the power of Kiro.
[00:34:15] Speaker B: Yeah.
[00:34:16] Speaker A: Anyway, we always think of the best titles halfway through.
[00:34:20] Speaker B: Anyway, I feel like we need to have the show title at the beginning and at the end revote on the show title and see how often it changes. Anyway, as AWS IAM Policy Autopilot, an open source static code analysis tool launches at Reima 2025 is a now a CURA power allowing developers to generate baseline IAM policies within their CURA ID without manually writing policies. CURA use includes rapid prototyping for AWS applications, baseline policy creation for new projects, and keeping developers encoding environments rather than switching to IAM console documentation.
So the base thing that was announced that reinvent the AWS IAM policy autopilot is something that I've seen 50 different ways to do over the years.
I played with this one at one point. It's actually pretty decent. Adding as a cure of power where as you're developing it automatically generates your you know, your IAM policy is actually pretty cool too and it's kind of that real time loop generate the policy. You know, if you put this into your workflow where it automatically generates the policy on every commit, you can then really start to see hey, this feature I did requires this ability which I thought is actually a pretty cool way to handle stuff.
[00:35:44] Speaker A: I'm really on the fence about this because on one hand I know the pain about especially with things like deployment policies if you want to have a specific policy for a deployment worker in Terraform, let's say and just trying to figure out every permission that needs to be added so that you can so the terraform which is very non transparent in itself can just do deployment. It becomes very complicated. At the same time if you have a machine that looks at your code and says, this is the policy you need for it, then I don't feel like that's any security at all. Unless there's another check at the end, there should be a sanity check, you know, so you have the spec for the tool you're writing or the app you're writing, and it should be intelligent about the types of permissions that should be required to run that or to deploy that, rather than just blindly going ahead and saying, oh, I see you're calling this endpoint. Let me give you permission to do that, because that, that is no security at all.
[00:36:47] Speaker B: Well, in theory, one, you should be reviewing what it's producing.
Two, you should have SCPs or anything else along those lines associated, you know, depending on how you want to manage your permissions, deny policies, whatever associated with these things. So maybe you don't want Route 53 delete, you know, register. Like you could put that at a higher level. And even if you do try to delete it, it won't work.
So there are many checks and balances you can do. You just have to build that into the human process, not necessarily the coding, development, AI process.
[00:37:26] Speaker A: Yeah, I mean, I get it. I do see the value in it. SCP is a real pain though, because all of a sudden you get these very untransparent messages about, oh, this isn't working, but it's in my policy, which has been blocked by something else or an ORG policy or something else.
There's no easy solution to security and there shouldn't be an easy solution to security.
I think perhaps the best next iteration of it will be, you know, we already have user behavioral analysis, watching people typing on their laptops to make sure they're not doing their laundry or, you know, having fun during the day when they should be working at home. I think we absolutely need machine behavioral analysis.
And actually sort of, yes, we have to still provide some kind of policies, but maybe we can be a little looser with the policies. If we have tools that are constantly watching to see what's going on, to see, to see what unexpected actions are being taken.
It's going to be a combination of things. I just, I just, I know people will use this and not check the policies that it generates.
And thinking about things like supply chain attacks of NPM modules or Python modules, it would be super easy to slip some code in and then have this generated policy which gives an attacker privileges in your cloud environment without realizing it.
[00:38:40] Speaker B: Yep.
So Jonathan and I decided we were going to do something a Little bit different today. There's all these other stories that we want to talk about, but they're more just things that are happening than conversation pieces. So we made an honorable mention section. Do you want to kick us off, Jonathan, with the honorable mention number one?
[00:38:57] Speaker A: Kick us off. So Amazon Redshift Serverless has introduced three serverless reservations, which doesn't sound very serverless, but there you go.
Amazon also says it will spend 12 billion on a Louisiana data center. It's the only place they can get enough electricity, I believe at this point.
[00:39:14] Speaker B: Consuming power everywhere and announcing AWS Elemental interface, an interface for fully managed AI servers that generates vertically cropped images and videos for things like TikTok, Instagram Reels, YouTube short and other similar platforms without dedicated production stuff.
[00:39:34] Speaker A: What could be worse than TikTok videos than AI generated TikTok in their mind, they already are mostly.
[00:39:40] Speaker B: I was gonna say.
[00:39:42] Speaker A: Yeah, I'll just leave that one. Okay, let's. Let's move on to gcp.
Okay. Google Cloud expanded its managed MCP service support to cover Alloy, db, Spanner, Cloud, SQL, bigtable and Firestore, allowing AI agents to interact with these databases through natural language without requiring infrastructure deployment or complex configuration. And I think this is absolutely the way to go for everything.
I think, you know, MCP Server will feature as a, as an API to every service that every cloud provider has within the next 12 months. If they don't, they're behind the times.
So the security model relies entirely on IAM for authentication rather than shared keys and all agent actions logged in the cloud audit logs. You would hope so. A new Developer Knowledge McP server connects IDEs directly to Google's official documentation.
Okay, good luck with that. Let the agents reference best practices in real time during tasks like database migrations. Can you just imagine you're halfway through a database migration? There's like, hey, it's like Clippy pops up. Hey, you should have done something different anyway, because the servers now follow open MCP standards. They work with third party clients like Anthropic's Claude in addition to Gemini, which broadens the practical appeal beyond teams already committed to Google's AI tooling.
[00:41:01] Speaker B: Anything that makes databases easier, I am all for.
I just dislike databases. They're always the problem. There's always a problem in them. They always make my life a living hell.
But I understand they're a necessary evil because we have to store data in a relational, relational or non relational database scheme. And this is what you end up with.
17 tools that help you understand your database.
[00:41:31] Speaker A: Yeah, it's it's got me interesting. Imagine testing a service like this that uses natural language to talk to a database and trying to debug why it didn't give you the answer you expected. It's like, oh, you didn't say please.
Yeah, you got three queries and if you don't say please, you don't get a fourth one anyway. Gemini 3.1 Pro is now available in preview for developers via Google AI Studio, Gemini CLI, Vertex AI and Android Studio with enterprise access through Vertex AI and Gemini Enterprise.
Pricing details aren't publicly publicly announced yet. The model scores very highly, 77.1% on the ARC AGI2 benchmark, which tests reasoning on novel logic patterns, representing more than double the score of the previous Gemini 3 Pro model. That sounds like a huge architectural change they've made. And the fact that it's a point release is crazy. Probably indicating that they already built this into Gemini 3, but it hadn't enabled the feature. Practical use cases highlighted, including sorry, include generating animated SVGs from text prompts, building live data dashboards by connecting the public APIs, and prototyping interactive 3D interfaces with hand tracking and generative audio.
The example suggests the model is particularly suited for developers working on data visualization and creative coding projects.
Sell your Unity stock now.
[00:42:57] Speaker B: I mean it's just amazing to me how fast these models are improving.
This one's saying it's a 77 score of 77% where I mean models a year ago were like 40s and 50%.
So like just seeing how fast everything is moving is still insane. I mean, I love that they are rolling out to everywhere. You know, I play with Notebook LLM for various little projects I have, so it's nice they're rolling it out not just to, hey, you know, Android and all these other places, but they're rolling out across the suite, which is beneficial too.
[00:43:32] Speaker A: Yeah, I'm wondering when we're going to have this sort of price adjustment of AI though. It's very cheap, very, very cheap right now. Everyone complains about the price and yes, yes it's expensive, but also for what you get, it's incredibly cheap. And so I, I think there will come a point where the price will adjust up significantly and people will actually have to pay for the value they're getting. Instead of sort of funding, the building of data centers is pretty much what you're paying for right now.
Pay somebody's electric bill.
[00:44:01] Speaker B: Google's Firefly is a clock synchronization protocol, is a software based clock synchronization protocol that achieves sub 10 nanosecond for nick to nic synchronization across data center hardware without requiring specialized or expensive dedicated timing equipment. The protocol uses a distributed consensus algorithm built on random graphs rather than a traditional hierarchical time server model, which improves convergence speed, stability, scalability and reliance on network path asymmetries. Firefly decouples internal synchronization from UTC time synchronization, meaning external time server jitter does not degrade the precision of the clock alignment within the data center fabric itself. This is highly important for finance ML workloads and other systems that require that level of detail.
Yeah, I think it's pretty cool.
[00:45:01] Speaker A: It is cool. And thinking about what they did with the global replication of Spanner that was very much based around a very accurate time synchronization.
But the fact that you need to guarantee sub 100 microsecond synchronization for financial systems is crazy.
[00:45:20] Speaker B: And they're at 10 nanoseconds.
They're so far beyond that, my brain can't even compute realistically what a nanosecond is.
But doing it from server to server, I mean, it's a complete paradigm shift of like, you know, your strata zeros, strata ones, you know, and knowing that as you get down further, you get further and further away from the accurate type to having kind of a mesh network manage itself and the complexity of that and deciding what's accurate. But I guess they built the algorithm with enough intelligence that it's able to kind of keep that in line. I just wonder that if it ever like strays too far, is that we're going to be able to kind of come back. So if it's always expecting, all these are going to be within let's say a hundred nanoseconds and all of a sudden it's at, you know, four seconds off reality.
How do you kind of pull that back in? Like, do you slowly have to do it? So when we have leap seconds, is that can be a problem for these systems?
Not sure.
[00:46:26] Speaker A: Yeah, that's interesting. I kind of wonder if they have their own timer which is counting up, on which they map external time would seem to make sense.
[00:46:37] Speaker B: Like the US Epoch.
[00:46:38] Speaker A: Yeah.
Okay, so we have a few honorable mentions for Google as well.
Google's investing 15 billion in AI infrastructure. Surprise, surprise in India and launching America India Connect, a multi continent subsea cable in initiative that establishes new fibrotic routes connected the United States, India, Singapore, South Africa and Australia. Now Justin loves the subsea cable stories. It's gonna be very Disappointing to miss this one and we kept it in just for you. Justin.
[00:47:06] Speaker B: I like these stories. I always. I don't know why I find it so fascinating, but I do.
[00:47:10] Speaker A: I have another story which we're not gonna cover in this. It was unrelated random thing that popped up on my Google newsfeed because it knows I read nerd stories. It was the. The very first transatlantic fiber optic cable and is being pulled up from the sea surface.
And I couldn't believe how recently, relatively speaking, it was placed. It was only in the 80s when that thing was unrolled and sort of laid on the seabed. I guess they're pulling it up to clear routes for newer, bigger, faster cables. Also, it contains an enormous amount of copper, which is probably the main reason they're going to reclaim it.
[00:47:49] Speaker B: The space doesn't feel like in the vastness of the ocean. That's a good reason. The copper on the other hand, feels like a really good reason because it's not going to be cheap to lift that thing up. Because God knows if it's in the 80s, what's laying on top of the cable as they're trying to pull it up at this point? What, rocks and sediments and everything else?
[00:48:08] Speaker A: Yeah, who knows? That's a lot of copper. 2,000 miles of shielded fiber optic cable.
[00:48:16] Speaker B: Somebody had to do the ROI on that to determine that. Makes sense too.
[00:48:21] Speaker A: Google's also building a new data center in Willwage County, Texas, expanding its existing infrastructure footprint in the state. This is primarily an infrastructure capacity announcement rather than a new GCP service or feature.
I wonder what they're going to put in that data center.
[00:48:37] Speaker B: Well, what's interesting here is they're using air cooling though, versus water cooling. So it'll be interesting to figure out how they cool things in Texas with the air.
Just saying.
[00:48:51] Speaker A: Yeah.
[00:48:52] Speaker B: And our final honorable mention is the Lyra 3 to create music tracks in the Gemini app, you can now generate 30 second music tracks with lyrics, custom cover art and style controls from prompt text, uploaded pictures and videos. More things that can be used in TikTok, Facebook, Instagram and other social medias. Hence a 30 second limit on it.
[00:49:18] Speaker A: All right, let's move on to Azure, a milestone achievement in our journey to carbon negative. From the official Microsoft blog. Microsoft has achieved its 2025 goal. It's 2026 now, guys. Sorry. Of matching 100% of global electricity consumption with renewable energy.
Contracting 40 gigawatts of new renewable capacity around across 26 countries since 2020.
It represents enough energy to power approximately 10 million US homes with 19 gigawatts currently online and the remainder coming online over the next five years.
[00:49:53] Speaker B: I always kind of try to understand carbon neutral and carbon negative because you're buying renewable energy and you're talking about the level twos but you're still using the energy.
So you're really negative because somebody still had to build that power play to build these things that made it. Which I understand is also why the level 3 or the strata 3, whatever the third level layer is that there is where it talks about where you get your power, where you get your materials and everything feels like there's just different stories that you spin based on what you want to tell people at the time with it. But I mean it's, it's a great thing that they are doing that they're trying to buy renewable energy and we're trying to make the world die from all these, you know, data center usages.
[00:50:45] Speaker A: Yeah, it's a fine line, I think, I think really it's fossil fuels is, is the issue. If you, if you to go plant a forest full of trees, let them grow for 10 years, harvest them, dry them, burn them for energy and then plant some more trees that's still carbon neutral because you are not releasing any more CO2 into the atmosphere than the, than the trees already absorbed in their growth cycle. It doesn't sound very environmentally friendly burning a bunch of trees. But it's still carbon neutral. It's only not carbon neutral if you're burning energy faster than the earth is sequestering it. Which is a bit shocking for some people. You know, burning, you know, burning trash, burning paper, that's all carbon neutral. Because we planted those trees.
[00:51:31] Speaker B: Well, yeah, because you're essentially saying I plant trees and therefore I can do this.
[00:51:35] Speaker A: Exactly.
[00:51:36] Speaker B: So now generally available quota and deployment tool troubleshooting tools for Azure flexible consumption plans.
You now have the ability directly in the platform to see that you're hitting your quota limits, constraints without digging through documentation or opening support tickets and praying that somebody on the other end of the support ticket knows what you're saying when you point them exactly at the issue.
The quota troubleshooting experiences surfaces, flexible consumption, specific limits in context, which is useful when you hit a limit and you don't know why your application all of a sudden started breaking.
This is a great quality of life improvement because you can see why things are breaking when you're using flexible consumption as far as per execution billing model and fast scaling abilities reducing potential failures.
[00:52:28] Speaker A: Yeah, I kind of wonder if it really makes sense again. It's like it's this thing we build, a serverless thing. And yes, you can use it and it's per second or per gigabyte second or however they bill it in Azure. But if you're operating at the kind of scale where this becomes a serious concern, either because, you know, you put limits in place because of cost concerns, then maybe you shouldn't be using this in the first place.
[00:52:51] Speaker B: Well, yes and no.
I've seen it where you've hit limits on lambda when you do nightly processing or things like that, and you scale out a lot more horizontally than you normally do. So you kind of hit this limit, but so random things fail.
So I've kind of seen that where you in theory could launch something, but you want that horizontally scalable because you're running it all day at lower scales. But potentially you need to run it at a much more horizontally scalable as like certain times or if you're a web page and you all of a sudden become a fad of the day and you know, your website essentially gets DDOs from real people, so you kind of need that. Seeing those things I think is important and seeing. And otherwise as a developer, a DevOps, an SRE, whatever your role is, when you're the one debugging this, you can now actually see that you're being throttled and you can see the failures occurring where before you would be yelling at Microsoft support.
[00:53:52] Speaker A: Yeah, I guess, I guess the observability piece into what's causing it. I assume it's not just telling you that you've hit a limit, it's tracing back the call through the stack to tell you exactly why you've reached these limits or you've exceeded the limits. You were.
[00:54:09] Speaker B: I think it would tell you you exceed the limits, you know, or you don't have enough quota, you don't have enough ability to launch this in the tenant subscription. Sorry.
[00:54:19] Speaker A: Yeah, I guess the other thing is if it's built into the platform, then it's less. It's less tool than you have to buy. You know, you don't need the new relics and the datadogs to monitor this stuff for you.
[00:54:29] Speaker B: Yep.
[00:54:30] Speaker A: If it's built into the platform, so it's kind of cool. And the next thing will be auto, auto adjusting quotas based on, based on
[00:54:37] Speaker B: the new metrics, based on reality of the metrics. And that's kind of where I would love to see quotas be a lot more dynamic, where it's like you've launched two servers and by default you have a four server limit.
Okay, maybe at four you really should get an eight quota limit. Therefore if you launch two more because you're launching a second scale set or auto scaling group, you can get two more servers and not be at your limit without knowing it it. So if there was more dynamic in these limits it would be nice. But I understand they're there for multiple
[00:55:10] Speaker A: reasons that that makes a lot of sense. Actually you reminded me of the burstable compute on Amazon. You know they're t. Was it the TT instances?
[00:55:20] Speaker B: Tc?
[00:55:21] Speaker A: Yeah.
[00:55:21] Speaker B: Come on Jonathan. It hasn't been that long since you've been on aws.
[00:55:24] Speaker A: It has. It's been fuzzy. It's been far too long.
It kind of reminds me of that because you know the way you set your limits on that, you still build up some spare capacity while you're not using full capacity and then you can burn it in bursts when you have the traffic but then you go back down to baseline. I'd like to see more quotas be like that really and have some kind of time awareness. And so instead of saying I want this Fixed limit of 2000 invocations per hour or something for my serverless function, you should be like my target is around 2000 per hour.
But if you've been at 1000 per hour for four or five hours and then you have a peak up to 4000 that should be okay. Does that on average over the time period you're still below your average limit.
It's not costing you more than you thought if that usage had been higher earlier.
So I think a little more flexibility. I think you're right around the way quotas are implemented. A bit more time awareness would be kind of cool. All right, some honorable mentions. I think this is the last couple of stories. Microsoft Sentinel's CCF push feature is now in public preview which allows security data providers to send logs directly to Sentinel workspace without the traditional setup overhead of manually configuring data collection endpoints and Microsoft
[00:56:40] Speaker B: Sovereignty Cloud adds governance, productivity and large language models securely when running when even completely disconnected. Azure Local, which is their local zones, which is their local outpost and things like that. Disconnected operations are now generally available allowing organizations to run mission critical infrastructure with full Azure governance policy enforcement even when they're completely isolated from the cloud, AKA they don't have Internet. This targets governments, defense and other regulatory industries that require that independence.
[00:57:14] Speaker A: This is actually a bigger story than I realize I think. So the Foundry local stuff is on prem hardware so they're actually supporting Nvidia GPUs in the on prem hardware so you can perform isolated inference at that's fairly large scale. That's kind of cool.
[00:57:30] Speaker B: I think they've had that for a little bit, but this is really about it being disconnected. So if the outpost, if the Azure local what's Google? I don't remember what Google's is called, but if they are not connected to the cloud they're still able to continue to operate.
[00:57:46] Speaker A: Yeah, yeah, I know that was a big problem with outposts and I, you know I'm sure there's some large organizations and military who want to put these things in containers and drop them in various places around the world. Makes sense that you want local AI in that case to run whatever untoward things or toward things as the case may be.
[00:58:08] Speaker B: And finally in our emerging cloud section, introducing Command center, the unified operations platform for AI. Kouser Command and Control. Unified Platform consolidates GPU clusters, monitoring, orchestration and support into a single interface.
It supports managed Kubernetes and managed Slurm, allowing for long running multi week training jobs to operate continuously across large GPU clusters. Autoclusters is a key component that automatically detects GPU degradation, evicts compromised nodes and replaces them with healthy ones from a reserve pool. On the observability side, it supports multiple access methods including via UI, including Grafana, Prometheus endpoints and APIs.
The watch agent paired with the telemetry relay sends visibility to custom application metrics allowing teams to correlate workloads performance with underlying GPU health for more precise troubleshooting.
The whole stack here is what I kind of find Nice. You know, the smaller clouds are really trying to, at least from what I can see, attack kind of that whole vertical a lot more where they're giving you that depth all the way down to if you are training your own model with them. You get the cpu, you get the gpu, you can see that whole stack of what's going on and really start to fine tune and say oh, we slowed down our training because we lost two GPUs, you know. And seeing that is not something you would see on the larger hardware scales.
[00:59:47] Speaker A: Yeah, and Unlike most hardware, GPUs don't last very long in a production data center, especially if they're heavily utilized. A couple of years at most I think. And the hardware is pretty much scrap.
Neat. All right. DigitalOcean is also making some changes. They're adding AMD Instinct Mi 350X GPUs to its droplets lineup are built on the CDNA 4 architecture and optimized for inference workloads including a pre fill phase compute, low latency token generation and larger context windows. The platform demonstrated measurable results with existing customers, including a two times increase in production request throughput and a 50% reduction in inference costs for carrier AI. Giving potential adopters concrete performance benchmarks to
[01:00:31] Speaker B: evaluate new GPUs is always good. So if you're on DigitalOcean, here's your new GPU. Go burn some money.
[01:00:42] Speaker A: Cool.
And that is it.
[01:00:44] Speaker B: Well Jonathan, it's always great just chatting with just us, but hopefully next week we'll have a few more of us here.
[01:00:50] Speaker A: Yep, I don't think we quite reached our goal of 30 minute show today, but it's been good fun.
[01:00:56] Speaker B: Always fun to chat. See ya.
[01:00:59] Speaker A: See you later, Matt.
And that's all for this week in Cloud. Head over to our
[email protected] where you can subscribe to our newsletter, join our Slack community, send us your feedback, and ask any questions you might have. Thanks for listening and we'll catch you on the next episode.