329: Azure Front Door: Please Use the Side Entrance

[00:00:00] Speaker A: Foreign. [00:00:08] Speaker B: Where the forecast is always cloudy. We talk weekly about all things aws, GCP and Azure. [00:00:14] Speaker A: We are your hosts Justin, Jonathan, Ryan and Matthew. [00:00:18] Speaker B: Episode 329 recorded for November 4, 2025 Azure Front Door Please use the side. [00:00:25] Speaker A: Entrance Too soon, but we'll talk about that in a second. How are you guys today? [00:00:32] Speaker B: I'm doing well and Elise is here after 300 episodes. [00:00:36] Speaker A: Episode 25 we looked it up. We had to make sure it's been. [00:00:41] Speaker C: Over 300 episodes guys. Why? Why haven't you invited me back? Whoa, whoa, whoa. [00:00:47] Speaker A: I invite you? [00:00:51] Speaker C: Okay, okay, fair point, fair point. I did have a small child in between and I'm good, thanks for asking. I am tired. I have a toddler. [00:01:00] Speaker A: Yep, I raised you a six month old. [00:01:04] Speaker C: You win, you win. [00:01:07] Speaker A: So we're going to start with some follow up this week, which technically isn't follow up because it was happening in real time during the show last week. Azure's massive outage is over. It ish so the Azure Hatter experienced a global outage with front door on October 29th affecting all regions simultaneously. Unlike the AWS one that just affected US East 1, the incident started at about noon east coast time until about 8pm it affected all major services that run on Azure that leverage Front Door. At the time Front Door was completely down, returning errors in HTTP server errors and elevated packet loss to the network. Recovery included rolling back to last known good configuration and gradually rebalancing traffic across nodes to prevent overloading conditions. Some customers can continue to experience lingering issues after the official recovery. Microsoft temporary block changes which is still going on as of today October 4th, but in the latest news they are looking forward to hitting their goal of being able to make changes or purging the cash or anything on November 5th. This affected all services in there. Anything that leveraged Front Door in any way, shape or form. There was a pretty good preliminary post mortem and their full post mortem as normal should be loaded up about 14 days later. I had fun. [00:02:29] Speaker B: I think every time they announce that they're going to migrate something important to the cloud like GitHub to Azure Cloud, they have a huge failure like this and then they back off again for another six months. [00:02:40] Speaker A: This was a big one. I mean the best part about it, which maybe is it I felt bad for a lot of the people involved was while Front Door was down, the alerts they were sending out had links that were broken to Front Door and then once you finally got into those documentation links about how to build a proper HA app Guess what application it told you guys to use as your front door? [00:03:03] Speaker C: No, definitely not front door. I mean, this couldn't have impacted too many people, right? It's just front door. Only a few people use that one. [00:03:13] Speaker A: I actually noticed that less things were down than when US East 1 was down, which either shows what leverages AWS versus Azure or the cut over here was point it directly at your load balancer or anything else. So there's maybe a little bit better options here, but the fact that you're plus one week and still can't actually make changes or even do simple things like purge a cache makes me think this is a lot bigger on the back end than they let on at the beginning. [00:03:43] Speaker C: I think that's kind of typical with Azure, isn't it? [00:03:48] Speaker A: No comment. [00:03:51] Speaker C: Sorry, Microsoft. [00:03:52] Speaker A: Pretty sure there's some NDAs involved there. So, you know, at least what's working. [00:03:57] Speaker B: Now is continuing to work even if you can't make changes. So it's not, that's not that bad. I was just disappointed that, you know, the entire Azure ad service wasn't down because I really done a day off last week. All right, so on to general news. AWS and OpenAI have announced a multi year strategic partnership formalizing a $38 billion multi year agreement providing OpenAI immediate access to hundreds of thousands of Nvidia GPUs clustered via Amazon EC2 Ultra servers with capacity deployment targeted by end of 2026. I presume this is in their new data center in Northern Indiana that Andy Jassy announced on LinkedIn. Very, very poorly. The infrastructure will support both ChatGPT inference and next generation model training with the ability to scale to tens of millions of GPUs for agentic workloads. [00:04:50] Speaker A: I mean, there's a lot of GPUs that they're throwing in that data center. Let's start with that. And they're, you know, hitting a hodgepodge, you know, they're talking hundreds of thousands, you know, it's a massive, you know, multi year partnership and I think it really shows, you know, after we talked about last week, where OpenAI really kind of is shifting into the multi partnership strategy than relying completely on Azure. It's really focusing on letting them get the right capacity where they need it and not be limited based on a single cloud provider's capacity. [00:05:22] Speaker C: It sort of feels like OpenAI has a strategic partnership with everyone right now. So I'm sure this will help them just like everything else that they have done will help them. We're banking a lot on OpenAI being very successful. [00:05:38] Speaker B: Yeah, I can't remember what the last valuation was. It was like $500 billion or something like that. It's a crazy amount of money for a firm that's not losing money on every query that anybody makes right now. I really don't understand the power. I mean I appreciate the tools. The tools are fantastic. I, you know, chat JBT was, was great. I'm more of an anthropic gourd fan personally now, but I kind of like seven years, a commitment of seven years is, is crazy to me because we don't even know what six months out is going to look like at this point. We've got the, the Fed who are sort of modeling what might happen if, if AI destroys the economy or not. And that, that, that's potentially 18 months to two years away. So what does a seven year agreement even mean in this kind of environment? [00:06:28] Speaker A: It's like a sports player. There's 5 million outs, I'm sure in there for a $30 billion partnership. One it's like at best case it's worth 38 billion, I feel like. And in reality there's probably only a couple billion in commitments. [00:06:44] Speaker C: I guess it depends on what they're using all these new shiny GPUs for. But if they're building bigger and better models, because their big thing is AGI, right. Like they want to be there in the next two years and you know, get, get all their awards and accolades. But it's really hard to imagine more and more data feeding into their models without kind of the dumbing down of data and information because you're going to get models trading off data that was set up by models in. It's just kind of this weird cyclic thing that we're about to do when, when you ask people for information on stuff, they're like, oh, here's the AI result. I don't know if it was right. [00:07:27] Speaker B: Yeah, I mean there's only, there's only a finite amount of information on the Internet that's available to them and I'm pretty sure they've already got it. So. [00:07:35] Speaker C: Yeah. [00:07:35] Speaker B: So yeah, I could see architectural changes needing to be retrained from the ground up and that, that makes a lot of sense. But I guess just, just economically, you know, I'd hate to be the person who works with OpenAI whose job it is to figure out how to replace human roles with, with the tool they're building. What, what kind of, what kind of. I don't know. That would be a horrible Horrible role to be in. But at the same time, I think that's the reality. I mean, you're already seeing call centers being decimated by things like this. And, you know, I don't think the coding tools are quite good enough yet. But things are moving so quickly. Six months, a year away. I don't think people will need to be learning, you know, C University for their careers anymore. So I don't know, like, who's, who's, who's going to. [00:08:29] Speaker C: Maybe, Maybe. There's so much software. There is so much software in the world that is running all these back office things that we all rely on all the time and to rewrite that stuff as accurately as the years and years and thousands of developers working on it. I think that's going to take not just time to, like, build the code, but a ridiculous amount of testing. And it's, it's just not. I mean, it's easy to say it's not good enough yet. I don't think it'd be good enough in two years, maybe six years. Probably not. Maybe. [00:09:03] Speaker A: Maybe. [00:09:03] Speaker B: Well, we put them, put a note in the calendar. We need to get you back on again. All right, in six months and we'll review the situation. No, I, I don't know. [00:09:11] Speaker C: I'll be on in 300 episodes. [00:09:12] Speaker A: Six years. Yeah, we won't be here anymore. [00:09:15] Speaker B: We will have automated the show completely by then. We'll just be reaping the profits. [00:09:21] Speaker A: I will say, with Ryan and Justin, I was like, okay, can I use an AI voice in a British tone to make up enough what Jonathan would say? So if I have to do this by myself, I can at least have talking to somebody. That was my back, my trade thought process. [00:09:37] Speaker C: You know, Matt, if you had, if you had mentioned this to me a few days ago, I would have worked on that for you. You could definitely do that today. [00:09:44] Speaker A: That would mean that I thought more than a day ahead. And by that I mean I'm pretty sure I messaged you like three and a half hours ago. It's like, you want to do the podcast. [00:09:55] Speaker C: Oh, don't lie. Weeks ago you planned this particular podcast. [00:10:01] Speaker A: Yeah. [00:10:01] Speaker C: Before the news even came out. [00:10:04] Speaker B: Yeah, I, you know, I guess I just wonder who, who they expect the customers to be in in a year or two years or five years that will have the money to, to recoup these investments. [00:10:15] Speaker C: They're going on the Uber model. It'll be a while. [00:10:19] Speaker B: Of course, the other crazy thing is that they just may entirely get destroyed, go out of business. Another firm may come along. I know people are already working on alternates to Nvidia more efficient chips, more suited to the type of workloads. I mean I know there are some. Here's the guy from the Big short. He's basically got puts against Nvidia and somebody else right now because he thinks he's just a magnet bubble about to burst. And so it would be unfortunate for OpenAI to be tied into this contract for 7 years when. When there's this much, much better hardware that's available. And I know they've looked into building their own stuff, but maybe they decided it just wasn't going to happen in that timescale. [00:11:03] Speaker A: I mean in reality they're not tied to the hardware, they're tied them. Shortages in Amazon commit and Amazon already announced that the latest was it the Nova or one of their models is solely trained on the Trainium. Is that the chip, the T2 chips, whatever they're called, the trainium chips, the V2 or V3. So I mean at one point you're going to get more kind of like ARM did you know where you have more mobile, more specific hardware tied to things, tied to different use cases. At one point someone's going to develop their own chip and it's going to be the the solution or Microsoft's skin or magically announced quantum computing again has been solved. One of the two might happen. [00:11:49] Speaker B: Agreed. [00:11:50] Speaker C: I think, I think I would bet on the quantum computing. [00:11:53] Speaker B: I'm just, I'm joking. Come on, go ahead. [00:12:01] Speaker C: I, I just say, I'm just saying I bet on the quantum computing. I think that's going to be sooner than we expect. [00:12:07] Speaker B: Yeah, I hope so. [00:12:09] Speaker C: And we'll change all, you know, make all these defunct other strategic partnership Google. [00:12:17] Speaker A: Removes Gemini models from AI Studio after GOP senator complains Google removed its OpenAI Gemini AI models from AI Studio following a complaint from Senator Marshall Blackburn who reported the model hallucinating false sexual misconduct allegations when her prompts against her when prompted with leading questions, the model allegedly fabricated details, false claims and generated face fake news and article links. The removal affected non developer access through AI Studio user interfaces where model behavior tweaking tools could increase hallucination likelihood. Developer can still access it via the API or download locally. The insight highlights the ongoing challenges of AI hallucinations in production systems where no AI firm has successfully eliminated despite all the mitigation efforts. [00:13:12] Speaker C: So I think this was the Gemma models, not the Gemini ones. So the open source versions which feels a little bit, feels a little bit better than the Gemini ones because that's, you know, we're paying for those as we're proprietary. But this is kind of interesting because if politics get involved when they don't like the output of a model and they decide that Google or whatever tech giant has to pull the models down, now you're getting into some really weird behavior in, in the world if, if that's how we're going to do things. So if we're going to start saying, oh, this is a political response to the output of a model. So I saw something weird. It had to do with some output by the Google AI response right after the Charlie Kirk assassination. And someone in one of my social media circles was basically saying that Google was biased because it didn't know about what happened with Charlie Kirk. It was like, as far as I know, he's alive and running around and no big deal. And everyone said, oh, it's biased, this is horrible. And you know, who would do such a thing? And it got into this really political, weird set of conversations around this. And I tried to explain, of course, well, that's not really how these models work. That's, you know, it's not trying to be biased, but I think that this is going to be real and I bet we're going to start seeing more and more of this when the model doesn't give the output that someone is expecting and that person is in a position of authority. So be interesting to see if this kind of thing keeps happening. [00:14:56] Speaker B: At what point do you hold Google or the model, not that you can hold the model accountable, but at what point do you start holding them accountable versus holding the user accountable? Because this was an extreme case and I think it was prompted with. Was Senator Marsha Blackburn accused of rape is what generated the hallucination? And I think they turned the temperature way up and it was lots more randomness and it was clearly being manipulated to, to generate something which was not, not normal, but then the user did something with it. And I think that that person should be held accountable rather than, rather than Google. I mean, it's, it's, it's, it's tough because hallucinations, when it, when the way we see hallucinations is, is very interesting because if it's, if it's a fact and we already know the answer to the fact, well, why, why did you ask in the first place? And yes, they can be misleading if you don't know the answer and you're looking for a fact. But on the other hand, hallucination is very useful because they help form connections between different concepts and it's sort of verging on imagination and creativity rather than hallucination. We just call it hallucination when it's undesirable. [00:16:15] Speaker C: Right. [00:16:15] Speaker B: But it's not something we want to eliminate completely and it's not something we should try and eliminate completely. So I think the accountability needs to be sort of like further up the chain to be on the user. [00:16:29] Speaker C: Well, I mean, you can get any kind of output. It really just depends on what the training data is. We all know data from like Reddit is in all the models, training data. And so people can create just. I can't curse on here, right? People can create just stupid posts that have just completely incorrect data and they can create a bunch of them enough to like change what reality is. You know, OpenAI says, oh, GPT6 uses this new data. They can't fact check to any of it. In fact, none of it's fact checked. They just pull in large swaths of data and get what they get and the output is like, it's almost like magic. They don't, they didn't. OpenAI didn't know that GPT3 was going to be amazing. They guessed that it would be. It was like a theory. Right. And we're like, oh, wow, the output was really good. Let's keep, let's just throw more data in. And that was like the whole plan early on. And yeah, it's really good. But you're always going to get wrong, incorrect, inaccurate stuff because of what it's trained on, even if you didn't train it on there and you set like, you know, the, the randomness to be higher so it can get weirder things. Sure, I mean, you could definitely blame the developers or whoever was using the models to build their software, but this is always going to be a problem. I just don't see a reasonable way around it with what we know today. [00:17:55] Speaker A: And that's everything on the Internet though. I mean, I'm going to date myself a little bit. When Wikipedia first came out and you started using it, you can't, you know, we were told you can't reference Wikipedia, you know, because who knows what, what was put on there. And then every, you know, high school and college student just looked at the sources then, and then pulled those links, you know, and that's, you know, so like, you gotta look at the links or look at what the data is based on it and then from there figure it out, which goes into what you're saying is you can't blindly trust it. That's why it's trust but verify. [00:18:28] Speaker C: Wikipedia is our trusted source now. Yeah, like that's what the AI responses are. You know, ChatGPT is like the, the Wikipedia from 20 years ago. 20 years ago, 10 years ago. [00:18:41] Speaker A: Yeah. [00:18:42] Speaker B: You know, I can't imagine it would be particularly difficult when it's hallucinating links to things. How hard would it really be to, to be, to be Google running this as a service and they, when they see a URL and an output from, from a, from a query, check to make sure it exists before you pass it to user. [00:19:02] Speaker A: What if the website's offline? What if front door goes down? Well, while you're live querying it too soon. [00:19:09] Speaker C: Harsh. [00:19:10] Speaker B: I mean it would not be hard, would it, to validate that URLs actually exist and put a little note in there. Perhaps this is a hallucination, you know, or do some kind of fact checking in the background before you respond to user. I don't know. It's a powerful tool. I think we need a whole lot more orchestration around how it's actually used and deployed in customer facing roles. [00:19:32] Speaker A: But the piece of what you said there is the most important thing. It's a tool, it's. You have to use it as a tool, not as God and assume that it's 100% correct like it's a tool. Figure out how to do it. When I use it for software development, I don't assume that it's correct. I read the code or I write my, I have it, do unit tests, I validate it. Don't just assume that the first thing it does is correct. Because the first time I used it to development, it started making a PowerShell cmdlets and I was like this seems right. And then I looked at it and I tried to write it, I was like, this is a completely made up PowerShell commandlet that doesn't exist. So you have to trust but verify somewhere. [00:20:11] Speaker C: Well, maybe with all those new GPUs that OpenAI has, maybe they can do some more fact checking on the data coming into their models. [00:20:19] Speaker B: Yeah, I mean it's a very difficult problem if it's an actual fact with a, with a measurable, known, correct answer, you know, like how much does a kilogram of feathers weigh? You know, weighs, weighs a kilogram. Like that's, that's one. [00:20:36] Speaker C: And yet it's not that great at math, is it? It's much better now. It's much better now. [00:20:40] Speaker B: But the flip side of that is, is around things like news though, because the same story can be reported entirely differently and almost, almost with completely contradictory sort of outcomes depending on where you read it. So it's, it's, it's tough. Like I, I kind of see why the concern is here around, well, is it, is it too left leaning, is it too right leaning? But at the same time it's, it's only one model and people generally pick a position and that's, that's the way they see the world. I think the model has to have a position, but nobody's going to be happy because it's either going to be too center, too left or too right. [00:21:19] Speaker C: So maybe you just tell it your own political beliefs and it can tell you what you want to hear. [00:21:26] Speaker B: Yep. [00:21:26] Speaker C: All the time. [00:21:27] Speaker B: Yep. [00:21:28] Speaker C: Truth be damned. [00:21:28] Speaker B: They complain about that too though. You're absolutely right. Anywho, all right, introducing Agent HQ. Any agent anywhere you work the GitHub blog so GitHub's launched Agent HQ is a unified platform to orchestrate multiple AI coding agents, Stramantropic, OpenAI, Google, Cognition, who I never heard of, and XAI directly within GitHub and VS Code, all included in paid Copilot subscriptions. This eliminates the fragmented experience of judging different AI tools across separate interfaces and subscriptions. I mean, I kind of like the different interfaces. They all bring something a little different. I'm not, not a fan of least common denominator anyway. Mission Control provides a single Command center across GitHub, VS code, mobile and CLI to assign work to different agents in parallel, track their progress, and manage identities and permissions. Just like human team members, the system maintains familiar git primitives like pull requests and issues while adding granular controls over when CI runs on agent generated code. [00:22:34] Speaker C: So this one is not out yet. Right. So it'll be interesting to see how it ends up. I From the videos that I watched of how this will work, it's a little hard to imagine how successful it will be. In the short term, it seemed like it might be challenging. So if I'm running multiple agents, does that mean I'm running them in parallel or I just get to pick, oh, this one's going to use Claude and this one's going to use, you know what, you know, OpenAI's models and then I can just say, okay, go build this GitHub project ticket or you know, I don't have a good sense of like how integrated this is going to be until we actually see it. [00:23:22] Speaker B: Yeah, every time I see a system like this that can use multiple agents, I always Feel like it's missing the agent that manages the agents. It's missing the thing that says okay, please continue or okay, now pass this to here. I think that there's like a loop that needs to be closed on that whole, on the whole thing. It's definitely, definitely worth using different models for different things. I mean, at home I've got a fairly robust ML setup right now, so I'm doing a lot of co generation locally. But you know, things like Deep SEQ and glm, they're good, but not quite good enough. They're good to small things. I let my son use Open Web UI and he builds like Flappy Bird in the browser and stuff like this and it works fine for that. It's been trained on that. So it's, it's, it's, it's not, it's not like Claude or Chat GPT though, which, which can kind of take on a user perspective or a customer perspective and then figure out what needs to be done to kind of reach the end point. That's interesting. I think I like having the diversity available. I think it's useful, but then sort of having the diversity available, but now you've got to go to this one place to use. It kind of takes away a little bit from that. So I like the open ecosystem, but I want it to be open everywhere, not just. [00:24:46] Speaker C: They do have a, they advertised a mobile view so you can even see it on your phone while you're driving at a stoplight. [00:24:53] Speaker B: Great. [00:24:54] Speaker C: Lucky you. [00:24:55] Speaker B: Yeah. What we do when you run a pedestrian over. I was just checking to see how my automatic code build was going. That's it. [00:25:02] Speaker C: Exactly. Who wouldn't want that? [00:25:05] Speaker A: I feel like from like a compliance and security standpoint, it's what they're targeting here because especially they're targeting like the Pro plans and stuff like that where they, you can now control all these things because RA in GitHub you can say what models are available to your organization. Wicked Hub Copilot. So I feel like that's where they're kind of targeting is that more enterprise customer that lets you kind of turn on and off these things. I mean, it's, it's a nice thing to, to do, but you know, they feel like there's 12 competitors now, so are they going to be at this point so far behind the line that they're going to be able to catch up from cursor or root code or any of these other ones that kind of have the same thing out there? Or if you really want to call it for Shortskiro. My guess, it does have the multi agent, but everything's going to feel like going to have that as you will slowly see throughout the show. Speaking of other models that other tools that have multi agent, Cursor introduces its coding model alongside multi agent interfaces Cursor launches version 2.0 of its IDE with Composer, its first competitive in house coding model built using reinforced learning, a mixture of expert architects. The company claims it's 4.4x faster and simply just more intelligent than all of its competitors. The new multi agent Interface with cursor 2.0 allows developer to run multiple AI agents in parallel for coding tasks. Expanding beyond a single agent workflow that has been a standard AI assisted developer environment, Curse's internal benchmarks show Composer prior prioritizes speed over raw intelligence, outperforming competitors significantly significantly in token per second, while slightly underperforming in the best frontier models and intelligent metrics. The IDE maintains in VS Code foundation while deepening LLM integrations for what Cursor calls Vibe coding. If you haven't heard of that yet, you haven't listened to us before. While AI assistance is more directly embedded in in the developer workflow, previously Cursor relied on entirely third party models, making it their first attempt for full vertical integration in the AI space. [00:27:15] Speaker C: So I could talk about this one quite a bit because I spent a lot of time this week playing around with cursor 2.0 and composer. It's very exciting. So Cursor had it wasn't like the Composer version that they have now, but they had like an agent built in and I thought it was okay, but it was wrong a lot. That agent and the 2.0 agent seems fabulous relatively and very fast. You can, you can tell how fast it is. But I think the thing that I'm I almost like the most with the new Cursor version is they work with git work trees now and I haven't really seen a lot of people talk about this, but if we're going to do a whole bunch of stuff with a bunch of agents at once, you really have to. You kind of have to babysit them, the agents. So if you're using cloud code, IDE or using Cursor or whatever, it comes back with like here's a bunch of stuff is this okay? Or hey, I need to run it so that I can see if the output is good. Now I'm not just going to let some agent run stuff on my computer and sure I can be working remotely, but let's say I'm working locally. You have to like interact with the interact all the time or say, no, this isn't it. I want you to do this other thing. And it's sort of great to use different work trees with Git in order to have a whole bunch of branches, essentially branches checked out at the same time. So you can kind of keep an eye on all of them running at the same time. And I feel like the new version of Cursor helps support this, but, but the way they thought about it was they support Composer with these agents and you can have a whole bunch of them running, but you can say I want to run three that do the exact same thing, but I want to go just run it three different times with whatever prompt I have and pick whichever output I like the most, if that makes sense. You kind of have to see how it works to work with it, but it lets you really do work in parallel. And I don't think a lot of the other tools that we've talked about do that very well with the ability to actually like run and test things. But the way they did it with the work trees, you can actually like run and test the different versions. Although it's not a hundred percent what I want, it's like much closer than anything else I've seen for an actual like development workflow that's. [00:29:40] Speaker B: So the Composer is the thing that composes the actions for the multiple agents. ICM is kind of what the actual composer part of it does. Or am I misunderstanding? [00:29:51] Speaker C: Composer is just the name of their, I think their model that they're using. It was from a company that they acquired, I don't know how many months ago that had their own model. So it's like their own in house. [00:30:02] Speaker A: Claude. [00:30:03] Speaker B: That sounds kind of cool. [00:30:04] Speaker C: It's shockingly good. Like I, I would compare it to Claude. [00:30:08] Speaker B: Wow, that's, that's neat. I, I've been trying to separate my, my, like I, I, I enjoy programming. I don't want that taken away completely and I'm, I'm not not letting it. I've, I've been focusing on, you know, how to write better, not even just prompts, because it's like way beyond just a prompt at this point. So I've, I've been working more on kind of like the building, building a tool that uses AI to generate this documentation which we can pass to the other tools which will they do the code build. And I'm sort of trying to work towards a place where I just come with the like the product narrative in a way, and then build a tool which figures out everything that needs to be done, all the best practices. What type of compliance environment are you in? Is it a hobby project? You know, is it to run, is it to run your PC at home? Is it for the cloud, something else? So it's going to get all this information and then do its massive plan and then pass it on to the model. So I'd like to get to a place where I don't need an idea, I need a place to monitor the progress of things, but not necessarily directly interact with the code while it's being written. [00:31:23] Speaker C: I feel like I have to interact with the code. So I see a lot of bad stuff being done by every model that I have ever tried of anything that is even mildly non standard. If it's super standard, go make me a login web page that uses OAuth, whatever, whatever. Or make me a form that does this. Like that's, that's fine. It's kind of hard to mess up some of that. But if I am building like a complex application, I feel like if it doesn't get it right and perfect the first time in terms of structure and you say, well that's not right. I need you to change this one thing or this other thing or heaven forbid you open up a new agent to work with it which knows nothing about the first agent's context. It starts just creating a pile of garbage. And so like I feel like I need to keep an eye on the code and make sure that it's going in a good direction. I building something on the side right now that is a non standard shape. It's not a rectangular looking UI on this application. And it's like blowing the minds of Claude. Code and Composer actually is doing pretty well with it. But Claude just like it would tell me, oh, you're absolutely right, you're so right. This thing is completely wrong and I'm going to go fix it. You're a genius. And it would try to do something and it would do just something really terrible every time. And we went back and forth and I was like, I feel like a crazy person. I need to go just edit the code or tell it what to do in the code. And once I looked at the code and I'm like, I need you to refactor this function so it does A, B and C. Then it was okay, but it couldn't handle it. [00:33:04] Speaker A: But that's where I feel like, you know, you can't just go build true production quality Apps yet using code, like, using all these tools. Like, I still think that if you're building something that's production grade, you need somebody to go look at it. And it's not just saying, write me this thing that does this. You know, you have to look at it. I've seen. When I've done it, it goes completely sideways. And you're like, I told you to do this. And it's like, yes, you did. Thank you for telling me that. This again. And it just gets stuck in these loops that you're like, you're still not listening to me. Okay, let me go, like, tweak some of these lines and like highlight this section. Say, this is wrong. [00:33:42] Speaker C: Exactly. [00:33:43] Speaker A: Stop trying to use this command. This does not exist. And let's iterate this way on it. [00:33:49] Speaker B: I think the problem you've got, though, is the problem everybody has, not you specifically. I think the problem we have is we all have our jobs to do right now, some of which includes writing internal services or customer facing services. And we're also trying to figure out how to use this new technology and use it effectively and use it well to kind of make our lives easier. Because we like the idea of spending half an hour in the morning prompting an AI and then spending the rest of the day on the beach or something to that effect. But I think what needs to be happening when Elise is coding something and she can't get the shape right, or you're coding something and it's turning out like a pile of junk is you need to do retrospective on what it built versus what you asked for. And you need to go back and figure out, how could I have prompted this differently to get this outcome? And people aren't doing that. [00:34:42] Speaker A: All I just heard is it's my fault. [00:34:43] Speaker B: It absolutely. I mean, my microphone wasn't even on right now when you heard that. So I have a therapist. [00:34:55] Speaker C: Yeah. You know, there's one thing that I found that worked really, really well. If I get into a loop of I need something fixed, it didn't fix it. All right, let me try prompting it differently. Still didn't fix it. And did something similarly silly or something that didn't make sense. And I go ask it, well, why did you do this? Why did you do these two things? And then it's like, you're right. I should. I should rethink everything. It's. It's like I offended it, and it needs to go rework the entire thing. And that has worked really well. [00:35:25] Speaker B: Can do. I think one of the problems people are Coming across is that the context window, at least for Claude, is like 200,000 tokens and you can use a million token version. I do not recommend using the million token version for coding because the way attention works in the transformer is like the original transform models would have full attention across the entire context window. It's very expensive to do that because every token you add adds like it's O squared complexity. And so newer models are using things like sparse attention or hierarchical attention to kind of summarize different parts of the context window so that you're not paying attention to everything. But with sparse attention, specifically encoding, it sucks because, yes, when you say to Claude, this didn't go right, Claude can see right now in what you're saying and can see what it's just generated. But what it doesn't see, it doesn't see as accurately the original context that led to that. And so it will placate you and it will say, oh, yes, you know, very sorry, let's try again. But it doesn't know what it did wrong at that point. And so I think I've had better success with like not continuing a conversation where something's going sideways. I'll. I'll clear the agent and I'll start again. [00:36:51] Speaker C: Interesting. [00:36:52] Speaker B: After having done the retro, like a retrospective on it, like, well, how could I have prompted this better to get what I wanted and then start again and try again and kind of iterate on it that way instead of within an individual session? And that, that tends to work out pretty well. [00:37:03] Speaker C: Very interesting. [00:37:05] Speaker B: All right, what's next? We have, oh, more AI stuff. I mean. [00:37:12] Speaker A: Onto the world of aws. [00:37:13] Speaker B: All right, The Model Context Protocol MCP proxy for AWS is now generally available. AWS has released the MCP proxy, a client side proxy that enables MCP clients to connect to remote AWS hosted MCP servers. Have I been saying MPC the whole time? [00:37:32] Speaker A: I don't know. [00:37:32] Speaker B: Using AWS sigbe for authentication. This is, this is kind of cool. Now we can deploy MCP agents anywhere on Amazon and authenticate to them with their own identities, which is neat. So you can enable access to resources like S3 buckets, RDS tables through MCP servers while maintaining AWS security standards. Cool. [00:37:56] Speaker A: Yeah, I mean, this is a nice little tool to help with development. I feel like, please don't probably run this in production because you know someone's going to, but it bridges that gap and hopefully makes the developer life when you're trying to interface with these other things, you know, other tools like A easier stepping stone than having to, you know, build all these things yourself. So it's nice. They've also attempted to build in some of the safeguards like the read only mode I feel like is key. So that way people don't try to write all these things, you know and start writing through to your S3 blob storage and all of a sudden people try to trigger a delete command because the AI has decided that when you said change it wanted to delete the whole storage account. So you know, it's nice they not just set it up but put some basic thought into it of let's put a few guardrails into a day one. [00:38:48] Speaker C: I appreciate this seems useful. [00:38:51] Speaker B: Yeah, I go the opposite way man. I think like yes MCP servers are useful for during development sometimes but I think most the value is going to be at runtime. Instead I think of them as the eyes and the ears and the hands and legs of a model that otherwise doesn't have any kind of autonomy in the world. And so now you can build your own apps and you could connect to maybe even like a pay as you go type MCP service in the future. What's the weather going to be like next week in Hawaii? And there'll be an MCP service for weather which you could subscribe to for one center query or something and have it reach out to us. I see there being a whole network of these kind of ways that agents can act in the world. [00:39:41] Speaker A: I mean I would say that this is probably if we want to take out our reinvent bingo card. Maybe we don't do how many times AI is said this year out there. Maybe we do MCP for the number of times it said during the keynote. [00:39:54] Speaker C: Oh definitely. [00:39:56] Speaker A: But I think that this would be to me probably something they started building and would probably turn into a full blown service at one point. So that might be my re invent announcement. Maybe it's too soon since they just they just released it. [00:40:15] Speaker B: I haven't even talked about rain bands. We're already beginning of November. Where'd the year go? [00:40:23] Speaker A: Don't worry, next week you gotta have your wow brain is tired right now. Your ignite guesses which will be pure speculation from all of us, I'm pretty sure. So. [00:40:39] Speaker B: There are a lot of cloud cost management tools out there but only Achero provides cloud commitment insurance. It sounds fancy but it's really simple. Achera gives you the cost savings of a one or three year AWS savings plan with a commitment as short as 30 days. If you don't use all the cloud resources you've committed to. They will literally put the money back in your bank account to cover the difference. Other cost management tools may say they offer commitment insurance, but remember to ask will you actually give me my money back? Achero will click the link in the Show Notes to check them out on the AWS Marketplace. [00:41:18] Speaker A: Amazon ECS now supports built in Linear and Canary deployments ECS now natively supports Linear and Canary deployment strategies alongside with the existing Blue Green. This feature I swear has been not there for as long as ECS has been there. You've always had to use a different tool like CodeDeploy Deploy to handle gradual shifting linear deployments shift traffic equally in pre built steps while Canary does a small percentage. The feature integrates with CloudWatch alarms for automatic rollback detection as well as supporting lifecycle hooks for custom validation. Both strategies include post deployment bacon time to keep the old version running for an X period of time. After all, traffic is there in case you have to roll back. It's generally available today and is supported in all the major tools AKA the console, the SDK, the cli, CDK and Terraform without any additional cost. I mean this was an exam question that I definitely probably should not say that out loud where the only way that you could actually do deployments built into ECs for years was the built in Blue green. If you wanted anything else you had to integrate in with CodeDeploy, which wasn't a hard shift, it just it was an extra step you had to do versus this doesn't feel like a major lift unless if they always just wanted to use the CodeDeploy service. Which made sense also to me that the ECS team could focus on the ECS service at the time. [00:42:50] Speaker B: I always wonder why they haven't built these things previously. And I guess, I guess it was possible through CodeDeploy. But if it was possible through CodeDeploy then why add it to ECS now? Felt like we kind of get this weird sprawl used with Amazon services who were really well self contained, really isolated and they thought very much about the interface between it and the customer and the other services. I find it a little strange. I also found it strange that it wasn't included in ECS when it launched because Canary deployments especially are just so valuable. Maybe. Maybe the type of people who were consuming ECs rather than EKs or running their own kubernetes classes weren't the kind of people who were thinking about this kind of orchestration Now. And so perhaps this is kind of a sign that ECS is kind of matured along with the customers that use it. [00:43:44] Speaker A: There was always ways to do it, but, like, I'm trying to pull up my own. My old terraform for that I wrote for a service I did. It was there, but it wasn't great because, like, especially with CodeDeploy, you have to generate the like environment, then you have to have the deployment and like, there's, there's a bunch of moving pieces in there that you kind of have to link together in order to make it work. So it's just one more link that you have to do. It's not a. In my opinion, it's not a hard link, but it's just something else you had kind of had to set up. So this way you just don't have to. Obviously I can't find the code in real time now. [00:44:27] Speaker B: All right, Amazon Route 53 Resolver now supports AWS Private Link. Finally, Route 53 Resolver will allow customers to manage DNS resolution features entirely over Amazon's private network without traversing the public Internet. So this isn't queries over Private Link. This is access to the control plane of Route 53 over private link, which I know Google already has. And it can be a horrible feature, but good for anyone who wanted this. The integration addresses security and compliance requirements for organizations that need to keep all API calls within private networks, such as deleting, you know, creating deleting records, editing resolver configurations and things like that. It's available immediately in all regions where Route 53 Resolver operates. And yeah, I'm sure there's a whole bunch of people have been waiting for this. [00:45:19] Speaker A: Yeah, it solves a lot of compliance needs and especially if you're using Route 53 resolvers, you're already trying to link back to your corporate or anything else networks. And these are going to be highly regulated environments. So I'm sure this was a checkbox compared to day one. It's released in govcloud that some fedramp was ils. You know, any of these, you know, high security, you know, customers or controls just solves that problem. [00:45:49] Speaker C: Sure, someone's cheering really loudly. [00:45:52] Speaker B: Doesn't it bother you, though, that it's still in the cloud, it's still not your private network. It's still, it's still literally a virtualized network running in, you know, one of tens of data centers globally. Like, you know, it's such an arbitrary line to draw or. Oh, it's going over the Internet. Yeah. Well, we have post quantum safe tls and all these other things. Tell me why that's more at a risk than this. I don't know. [00:46:20] Speaker A: I mean, in theory it's staying within the AWS infrastructure. So somebody performing a man in the middle attack or anything else like that is lower or BGP routing all traffic through in another country, which definitely has never happened before, you know, captures all that traffic and does whatever they want with it. But you know, it's where some security person that gets paid a lot of money, Ryan Lucas, has decided the line in the sand is and thus that's where where it is now. [00:46:51] Speaker B: Yeah, I put this in the Checkbox compliance category of features. [00:46:59] Speaker A: I have a lot of opinions about Checkbox security and checkbox compliance that we can talk about. [00:47:04] Speaker B: After I registered that domain name, by the way, I got Checkbox, I think I'd see the checkbox compliance with checkboxsecurity.com I was going to make a website which gathered stories from security companies and basically made a mockery of them and pretended that we were the business that was advertising all this nonsense. I may still have to do that. Then maybe when I get some free credits from Anthropi again, I'll I'll put together a tool that just builds the website full of it's like the Onion for inposec. [00:47:38] Speaker A: Link it up to git commits from last night too and you can have a lot of fun be like this is all of our source code that we have. Mount points for S3 and mount points for S3 CIS sorry CSI drivers add monitoring capabilities mounts for S3 now emit near real time metrics using OpenTelemetry protocol, allowing customers to monitor operations through CloudWatch, Prometheus, Grafana and other tools that leverage the standard. This addresses the significant operational gaps for teams running data intensive workload that mount S3 buckets as a file system. Please don't do this on ECS, EC2 instances or Kubernetes clusters. Again, don't do this monitoring these metrics provide granular metrics including request count, latency error types. At the EC2 I'm stuck on ECS after the prior article instance levels, enabling proactive troubleshooting of issues like permission errors and performance bottlenecks. Again, if you're doing a kubernetes, please don't do this. Just use the object storage for what it's meant for integrations work through CloudWatch agents and OpenTelemetry collectors, making capabilities with existing monitoring infrastructures that many management many organizations already have deployed. This Update is particularly relevant for workloads leveraging ML data analysis and containerized applications that treat S3 as a file system and need visibility into the storage layer. Performance. Please don't use this feature. Please don't use this tool. [00:49:14] Speaker B: Oh yeah, adding monitoring to it because people use it and they didn't work right. So at least they can look at the monitoring now and say, yep, it's working just as we told you it would. [00:49:22] Speaker A: Poorly. I mean, it's. It was a long time ago, I was talking to a customer, they were like, so what we're thinking of doing is leveraging the S3FS file system and running SQL on top of it for. And I was like, please don't ever do this. Please just stop what you're talking about right now and do not do this. And that's pretty much what AWS has enabled. Leveraging the whole S3 mount, you know, is like, I get what they're trying to solve that, you know, when they're trying to get customers to move to the cloud and to use Blob storage, but like use one of the other more native things like EFS or refactor your application. I understand it's easier said than done, but like moving to EFS I feel like would be faster. Maybe you end up with NFS locking and issues like that. But using S3 as file storage just makes me hurt. Probably because I've been bitten by it before. [00:50:21] Speaker B: Yeah, I think this use case I really is targeting VLA running. What else do you think? AI workloads on Kubernetes though, and they need access to blobs that don't change. You know, I don't think this is really as a service targeting people who are trying to use S3 as an extension of a local file system. I think this is, this is really reference material for inference or training material for inference for the most part. [00:50:48] Speaker A: Why not then throw it on a EBS volume, cost, cost, performance. [00:50:56] Speaker B: I don't, I don't know, access, better. [00:50:59] Speaker C: Performance, say, would that be cheaper and faster? [00:51:04] Speaker A: No, you're not going to necessarily be faster. I mean, I guess if you're in single zone and use, you know, some of the, the new features that they released in S3. So if you're leveraging it all that. [00:51:14] Speaker C: Way, but if it's for training data, you're not going to like copy that data a bunch of places. You don't need the same kind of redundancy. [00:51:22] Speaker A: Yeah, I don't know, it just still feels like the monitoring is there for you know, it was built so support could tell you you're doing it wrong in a much easier way. Look at the number. Like I said before, you're doing it wrong. In case you can't tell, I don't like this. [00:51:39] Speaker B: I think this is very much a. I'm renting this machine by the hour and it's costing me $26,000 a day. I want to make sure that everything is performing as optimally as it possibly can because I simply can't afford for this, for these, these reads to take longer. [00:51:56] Speaker A: Yeah, yeah. [00:51:58] Speaker B: Maybe they should ask OpenAI where they got their trillion dollars from that they've seemed to spend like three or four times over now. It's like, well, yeah, anyway. All right. New Log Analytics Query Builder simplifies writing SQL code. [00:52:12] Speaker A: Quote. [00:52:13] Speaker B: That's a new word code. This came from the Google Cloud blog. Google Cloud released the Log Analytics Query Builder, which provides a UI based interface which generates SQL queries automatically for users who need to analyze logs without deep SQL expertise. Presumably you still need to know what you're looking for. It doesn't do that for you yet. So the tool addresses the common challenge of extracting insights from nested JSON payloads and log data, which typically requires complex SQL functions that many DevOps engineers and SREs find time consuming to write. This is true, Very true. [00:52:48] Speaker A: I hate SQL. [00:52:52] Speaker C: I just think it's impressive that we still write big, long, complex SQL statements and that's still what we think is the best way to query data. I just think it's shocking. [00:53:04] Speaker A: How old SQL and how much has it really changed over the years. You know, still. Select from where. [00:53:09] Speaker C: I mean, they keep adding stuff. [00:53:11] Speaker A: I stopped doing stuff at the beginning. Select from where. That's my SQL knowledge. Anything after that, an AI tool is ready for me. [00:53:19] Speaker B: Google plans to expand the feature with cross project log scopes, trace data integration, and for joining logs and traces, query saving a history and natural language. There's Query Builder works with existing log analytics pricing, which is based on the amount of data scanned during queries. That's a disincentive to write efficient queries. I mean, I think this is where everything is going. I don't know. I was exaggerating slightly when I said people wouldn't need to learn C anymore. But why, like you say, Elise, why spend half an hour crafting a perfect SQL query when you can literally just say, find me all the whatever from this data where these, where these criteria met and have it figure out all that for you? [00:54:02] Speaker C: Absolutely. And it should fill in the gap. So it looks like you have a foreign key here. Do you want this data too? Yeah, that sounds great. Happy to help. And scan more data for you too. [00:54:12] Speaker B: Yeah. I noticed you forgot to include this metric in. This seems to correlate with whatever. Yeah. I mean. [00:54:18] Speaker C: Yeah, well, that'd be an extra $4. [00:54:23] Speaker B: Yeah. [00:54:24] Speaker A: Add a zero. It's the. [00:54:25] Speaker B: Do you want fries with that? Phase of AI? [00:54:31] Speaker A: That should have been a show title somewhere. [00:54:34] Speaker C: That's the new title. [00:54:35] Speaker A: Yeah. [00:54:36] Speaker B: Yeah. I have to start name of your shows at the end, I think. All right. Google has open sourced a GKE extension for Gemini CLI that integrates Kubernetes engine operations directly into the command line AI agent. The extension works as both a Gemini CLI extension and MCP server compatible with any MCP client. That would be the point. Google allowing developers to manage GKE clusters with natural language commands. Boy, I hope they hope they filter out those natural language commands. The integration provides three main capabilities. GKE specific context resources, pre built slash commands for complex workflows and direct access to GKE tools including cloud observability integration. Poorly parade centered including cloud observability integration. [00:55:27] Speaker A: I mean, anything to make Kubernetes easier to manage. I'm all on board for it, but I still think that there's still going to need a few more iterations. You know, all you're doing at this point is just a natural language query and it's going to give you a Kubernetes output. Like great. I have to, you know, less documentation look up. I get the value of it, but at the same point, like it's a good starting point. Yeah, I think also stop using Kubernetes. [00:55:56] Speaker B: I think some of this is like missing the real values. This is like using a really nice tool to do the same old job. Whereas, you know, you presumably still need to know the terminology, you still need to know how the cluster works. I don't expect you could come to this and ask it a truly natural language question, which is why does my pod keep restarting? Or how do I build this or how do I do this? It's going to be just a different way of doing the same thing rather than a better way of doing the same thing. [00:56:28] Speaker A: Right. I mean, unless if it understands your actual. So if you had your prompt configured and this is obviously not what this is able to do, but if you add your prompt configured. But here's my environment. I have these types of pods here, here and here and workers and web and load balancers and you know, Certbots, et cetera, et cetera kind of set up. And I am now, you know, so here's, here's the basis of it and then this. I'm having issues at this layer and this is, you know, please work from there. But this is just. Hey, I need to get logs. What's the, you know, cube C. Excuse me? Kubectl logs, you know, dash F command for this. Like it hasn't gone to that next level, but maybe this is just stepping stone to that. [00:57:14] Speaker B: Yeah, it's. [00:57:15] Speaker A: Or what you can do is just scale. [00:57:17] Speaker C: Well, if. If it gives you bad results, what's the worst that could happen? [00:57:22] Speaker A: I mean the same thing that the Microsoft one when I played with the auto SRE thing at one point. It solution to everything with scale up. [00:57:32] Speaker C: Yeah, I think they can't do too much at a time because they'll introduce so much risk that you'll get another. This brought production down. Whoopsies. [00:57:42] Speaker B: Yeah, I'm surprised we haven't got an AI orchestrator for Kubernetes yet, knowing that sort of a large proportion of new code written by Google and probably AWS, probably everybody at this point is written by AI and would not be surprised if AI starts improving Kubernetes for us. And maybe the first AI suicide will be the first AI DevOps engineer looking after a Kubernetes cluster somewhere. [00:58:13] Speaker A: All right, onto our theme of the show Master Multitasking with Joules extension for Gemini CLI Google has launched Joules extension for cli, which acts as an autonomous coding assistant that handles background tasks like bug fixing, security patches, dependency upgrades, while letting developers focus on primary work. Jules asynchronously using the joules command works working in isolated environments to address multiple issues in parallel in creating branches for review. The extension integrates with other Gemini CLI extensions to create automated workflows, including security extensions for vulnerabilities analysis and remediation, observability extensions for crash interaction and automated tests. Jules addresses competitor developer productivity drains by handling routine maintenance tasks that typically interrupt deep working sessions. The tool is available on GitHub now. See prior comments maybe about multitasking. [00:59:16] Speaker B: Yeah, I was just thinking like Google obviously listens to their customers because it was only half an hour ago when I said something like this would be pretty useful. [00:59:25] Speaker A: And they vibe coded it in real time, got it into our show notes and ga. It was unbelievable. [00:59:31] Speaker B: Unbelievable. Yeah, I'll definitely check that out. [00:59:36] Speaker A: But this is a lot of the stuff that to me AI is useful for. Like I definitely used it for Larger things. But hey, run the scan of my environment, you know, and tell me what security issues are in there. Run the analysis of my dependency analysis. All the annoying stuff that no one cares to do, just go do that for me. And like Snyk and other tools have where like hey, you're using version 1.0.12 and there's a vulnerability initiative update to 1:12.13 but it's not the larger scale upgrade in that way where this in theory could start to kind of build out a lot of that stuff. So you know, it's like they call it out here but this is just a general like vibe coding comment. Like this is where at least I find a lot of value is especially on like my little side projects like go run the security analysis and tell me what you see. And it's just producing better code, you know, that has less vulnerabilities than you know, my stupid buffer overflow or my SQL injection that I accidentally wrote into my code base. [01:00:40] Speaker B: Yeah, I'll check it out. I like the idea in principle. I kind of like to see what's, what it's actually kind of doing under the covers really and whether we could get it to do other stuff. But I think like every time you. [01:00:51] Speaker A: Say could stop hacking AI, every time. [01:00:53] Speaker B: Every time you say could be doing this, could be doing this, it's coming, it is going to be coming, it's coming. You know one of the first use cases if you remember a few years ago that was Amazon using AI to rewrite code to be compatible with later Java versions, presumably their own version of Java that they built. So yeah, I think the types of automations that you're looking for, like updating code to work with new versions of libraries or modules or to you know, get away from vulnerabilities and things like that, it's, it's coming, it's coming very soon. It's probably been tested already. [01:01:30] Speaker A: I've used, I mean I've done some of it. I haven't done like major things like open SSL. You know 1x to 3x is the first thing I can think of with that, a major jump. But you know, there I've done it for small like more incremental changes. [01:01:46] Speaker C: Well this would be cool. I haven't tried tried this out yet. Jules in general I know is not have the same reputation as some of the other tools but if it can integrate, that seems kind of handy. I will say if you haven't visited the Juuls Google website, I think it's one of My favorite websites. It's beautiful. Check it out, Jules. Google. [01:02:10] Speaker B: Oh, cool. [01:02:11] Speaker C: It's just. I can't stop looking at it. [01:02:14] Speaker B: Does it have a table? [01:02:14] Speaker A: I just learned that Google is a TLDR own Persona. [01:02:18] Speaker C: Yeah. Very fun, colorful. [01:02:22] Speaker B: All right. [01:02:24] Speaker C: Eight bit. [01:02:26] Speaker B: Ah, yes, very. And this is love. I love that even Claude code is. It's very kind of retro looking on the terminal with the ANSI art and stuff. It's got a nice, nice throwback. [01:02:37] Speaker C: You know, when I first used Claude code, the first thing I had it do for me is create some antsy art, ASCII art. It just gave me the idea that I needed to do it as soon as I opened it, and so I did. [01:02:52] Speaker B: Yep. That's funny. I. I installed Cowsai on my son's computer. I heard that in years he's been. [01:03:00] Speaker C: Having a good throwback. [01:03:01] Speaker B: Yeah. Yeah. Well, we're talking about ask. [01:03:03] Speaker A: Yeah. [01:03:07] Speaker B: It'S got all kinds of animals now. You can animate them out. You can do all kinds of stuff. Like. [01:03:12] Speaker A: Did you do a pull request? Is Kousse still an active like GitHub? [01:03:16] Speaker B: Oh, yeah, totally. [01:03:16] Speaker C: Yeah, I'm looking it up. It absolutely is. [01:03:18] Speaker A: Yeah, I'm doing the Same. It's on GitHub. Yeah. [01:03:23] Speaker C: Originally written in Perl. My favorite. [01:03:29] Speaker B: Maybe we won't be having Elise back on the show again. I know. [01:03:34] Speaker A: So we got Elise for Pearl. We have Peter for Ruby. All right, so we're collecting our people. We need maybe a Cobalt person at one point. [01:03:42] Speaker C: Oh, no, I'll just say I can write a regex for days. [01:03:47] Speaker B: Excellent. [01:03:48] Speaker C: It's my specialty. Thanks to Perl. [01:03:50] Speaker B: Know who to come to. I'd rather write like an entire piece of code that passes whatever I want out instead of trying to write a regex expression. [01:04:01] Speaker C: Be a good one liner. They used to have these competitions anyway. [01:04:06] Speaker B: I've seen some really cool stuff like, you know, generating lists of prime numbers with regex and like crazy mathematical sort of solution finding just using the Regex engine. It's insane. [01:04:18] Speaker C: Yeah, you can do a lot. [01:04:20] Speaker B: All right. Announcing GA of cost anomaly detection. This is another article from the Google Cloud blog. Their anomaly detection services reach general availability with AI powered alerts now enabled by default for all GCP customers across all projects, including new ones. The service automatically monitors spending patterns and sends alerts to billing administrators when unusual cost spikes are detected with no configuration required. The GA release introduces AI generated anomaly thresholds that adapt to each customer's historical spelling patterns, reducing noise by only flagging significant and unexpected deviations. That's pretty cool. [01:04:58] Speaker A: It's just an improvement though. Correct. I feel like there's been anomaly detection in AWS and Azure for a while, so I assume it's really. It was there, it just was the new AI powered engine. You know, maybe I'm wrong, but. [01:05:14] Speaker B: Well, I, I don't think it's an improvement because I, my, my inbox on my, my work laptop is full of Google warnings from, you know, one or 200 projects saying the spend increased. I'm like, it's $25, it's one machine for a week. Yes, I know there was nothing in this project for six months, but we finally started using it and they're like, yes, but it's a 8,000% increase. I'm like, it's okay, it's 25, I don't need to know about it. So I think, you know, having some. [01:05:44] Speaker A: We had that with Asher for the same thing. We would be like, yeah, we scaled up for, you know, last week, like yes, we scaled up during the weekday. Yes, the last three days over the weekend was lower than Mondays spend. Correct. Thank you for this amazing revelation. [01:06:01] Speaker C: But aren't you still glad that you got that information? [01:06:05] Speaker A: No. [01:06:05] Speaker C: Doesn't it feel just a little bit okay? [01:06:08] Speaker A: No, because I guess spammed if we did do that. Every subscription. Every single subscription, multiple times a week. Oh, you scaled down. Yeah, your traffic had less. My favorite anomaly detection I ever got was we got a thousand anomaly detection alerts on Christmas Day last year and the prior year because all of our server loads were so much lower because nobody was using them. [01:06:34] Speaker C: But I can't expand their. Their detection to work across holidays that are annual. [01:06:42] Speaker A: We got so many alerts that day. [01:06:45] Speaker C: I just wonder. There's so many like third party companies that specialize in this kind of thing, so I wonder if they realize that they could just do a little bit better. [01:06:55] Speaker A: I mean the third parties all take like you know, a percent of your spend. If still feel like a lot of these tools, there's not a good cost model for it because it's either like a percent of your spend but if all your spend is on like BLOB or SQL, like you don't really care, like you know those and you've dove into a lot of those. So the percent of spend model I feel like is always hard but they're like, do you do a fixed cost? How much data are you ingesting? I feel like since the cloud and pay as you go works, a lot of these third party plugins that are supposed to help you with costs are like here's a good one time hit. And maybe you do that every three years or every couple years with a new tool, but I don't know that I ever buy one for the long term. [01:07:38] Speaker B: Yeah, it's difficult. People don't want to spend money. Cost avoidance. They, you know, they, they want, they want to have lower costs but they don't want to spend money to do it. And it's, it's, it's difficult. I think I, if I were going to do it as a, you know, as a, as a service, let's say, and charge people for it, I would simply charge people for running the report. You know, let's say I go and visit your company and I get credentials. I log in, I look at all the bills, I look at what you've got deployed and figure out what could be optimized. I would charge you a, probably a fixed cost in terms, in terms of hours for collecting the data and generating report. And then I'd hand it to you and say I think you could save $14 million a year in cloud spend. You know, call me back next year if you want me to do this again, you know, so, so I'd hope that what I suggested would have value and that the customer will be, is responsible for actually implementing the suggestions and lowering their cost themselves. I would like to think that by, by making it a fixed cost to sort of build a reputation for delivering accurate data and actionable insights rather than trying to gauge a percentage of, of spend. [01:08:53] Speaker C: Yeah. What are you going to do the next year? We'll save you another 14 million. [01:08:56] Speaker B: Who knows, maybe I charge him enough the first time. [01:09:02] Speaker A: You didn't say how much. [01:09:03] Speaker C: It was really big one time. Yep. [01:09:07] Speaker A: All right, last but not least, on to Azure news. Nvidia and Microsoft announced AI advancements. Microsoft and Nvidia are expanding their AI partnership with several infrastructure and model updates. Azure Local now supports Nvidia RTX Pro 6000 Blackwell Server Edition GPUs enabling organizations to run AI workloads at the edge with cloud like management through Azure arc, targeting healthcare, retail, manufacturing and other sectors that require data resiliency and low latency processing. Microsoft deployed their first production scale clusters of the Nvidia GB 300 NVL 72 systems with over 4600 Blackwell Ultra GPUs in the Nvidia ND V6GB 300 VM series. Each rack delivers 130 terabytes per second of NVLink bandwidth and up to 136 kilowatts of compute power. Designed for trading and deploying frontier models with further integration, liquid cooling and Azure boot for accelerated I O. I'm just going to pause right there and I think I got every one of those letters and numbers correct and not dyslexic. So I'm taking that as a win and just go from there. [01:10:24] Speaker B: Yeah, yeah, I'm impressed. But 136 kilowatts. You know, it kind of annoys me when things use so much electricity because all I hear is turn the light off, turn this thing off. You know, unplug your phone charger at night because that will save, you know, 7 watts for 12 hours. I get these messages from the electric company. You know, your idle load is 136 watts. You know, you could be doing better houses, other houses like yours are much lower than that. Like seriously. And, and then you drive down the street to, to the, like the, the used car lot or the Ford lot or something and they've got these massive like 2 kilowatt floodlights lighting up their entire, entire lot 24 hours a day. I'm like really? You want me to unplug my thing that that costs half a cent but not do anything about that other thing? About 136 kilowatts in a cluster. Yeah. I'm kind of hoping there's, there's some kind of going off on a slight tangent. I appreciate that. You know, power is very expensive, especially out here in California and I'm seriously hoping there is some kind of legislation or some kind of action taken that somewhat shields residential users of electricity from price increases. [01:11:37] Speaker A: I thought actually there was some federal thing that was done where you can't, I guess you could do at the residential level. But there was something that was done a while ago where you couldn't block data centers or something Building in every state, like building in the states. I think it was like a year ish ago, two years ago. My sense of time is very skewed. But it was something where they tried to do that because states that don't have enough power are essentially charging end users, everyone more money or support these data centers and other people coming in and consuming all the power. [01:12:14] Speaker C: According to the Google AI result over 340 data centers in Texas. So we know they don't have enough power and yet they're still building data centers. [01:12:28] Speaker B: Yeah, I believe in Ireland, Ireland was a, became a tech haven partly for their tax benefits and things. But they're suffering the same issue right now and they're turning people away and they're discouraging businesses from, from continuing to build there because they just don't have the power, they don't have the infrastructure, they don't have a place to build new new power supplies, new new power stations and things. That's. That's crazy. Anyway, sorry for hijacking you threat to complain about how expensive electricity is. [01:13:01] Speaker A: No, I get it. In other news related to NVO's Microsoft partnership, Nvidia AI Run is now available in the Azure Marketplace, providing GPU orchestration and workload management across Azure NC and MD series instances. Someone with a calculator can figure out what all those instances are. This platform is integrated with AKS Azure Machine Learning. Azure AI foundries help enterprises dynamically allocate GPU resources, reducing costs and utilization. And finally with Azure Kubernetes service now supports Nvidia Dynamo framework on the ND GB200v6VM series, demonstrating 1.2 million tokens per second with the GP OSS 120 byte model bit model billion. So we're reaching late into the case. Microsoft reports up to 15x throughput improvement over Hopper generations for reasoning models leveraging these deployments. [01:14:10] Speaker B: That's a really good salesy number to quote. 1.2 million tokens a second. That's great, but that's not an individual user. One individual user will not get 1.2 million tokens a second out of any model. [01:14:22] Speaker A: Challenge accepted. [01:14:24] Speaker B: That is at full capacity with as many users running inference as possible on that cluster, the total generation output might be 1.2 million tokens a second, which is still phenomenal. But as far as the actual user experience, you know, if you were a business who wanted really fast inference, you're not going to get 1.2 million tokens a second. [01:14:46] Speaker A: I mean we've said it before on the show and I still say I want a reason to play with these things and a company that will foot the bill. I just don't have a reason. Like it just is anti cloudy in my head still. Like I want to run on the smallest boxes that I could scale horizontally and everything else. And that's like that's where my brain processes stuff. [01:15:06] Speaker C: Well, the local LLMs small. The SLMs are pretty good these days. So you know, you just use one of those. [01:15:14] Speaker A: I used it on a There you go. [01:15:19] Speaker B: Okay. In Preview Azure functions zero downtime deployments with rolling updates in Flex consumption. Azure functions in the Flex consumption plan now support rolling updates for zero downtime deployments. Through a simple configuration change, we can't make that change just yet, can we? Maybe we can, I don't know. This eliminates the need for forceful instance research during code or configuration of dates, allowing the platform to gracefully transition workloads across instances. Rolling updates work by gradually replacing old instances with new ones while maintaining active request handling, similar to deployment strategies used in container orchestration systems. This brings enterprise grade deployment capabilities to serverless functions without requiring additional infrastructure management. [01:16:03] Speaker A: Yes, in the past you would have to pretty much delete and recreate everything, which is just less fun when you're on running a production workload. Yeah, it's always fun to tell your customers oh, we had to update this thing, so we had to, you know, delete and recreate it. But you know, it kind of just feels like guys, this should have been in day one being able to cleanly deploy it. But it definitely still feels like the they're still kind of ramping up on the whole flexible consumption, even though they've already kind of told you to move off. I think it was the old Y series or the Y is the new one for app services, so it's a nice quality of life feature that they're adding to everything. Again, it's not, it's in preview, so don't deploy production workloads leveraging this, but it's just an ARM template or terraform change, which is always convenient versus a Delete and recreate the Azure Pay as you go API shift was actually changing and why it matters Microsoft is deprecating the Legacy Consumption API for Azure Pay as you go cost data retrieval and replacing with two modern APIs, the cost detail API for Enterprise and Microsoft Customer Agreement subscriptions and the Export API for Pay as you go and Visual Studio subscribers. The shift from a pull model where teams constantly query the API to subscribe models where Azure delivers directly to your storage account. This change addresses a lot of scalability and API limits that you've run into if you've ever tried to use this old API in the past. A lot of retry loops, a lot of old code. I'm looking forward to getting rid of. The new API also supports the Focus compliance schemas, which is great that they are following all the finops stuff that they've agreed to, which includes reservations, savings plans in a single export. It also will better integrate with Azure BI with sorry with Power BI and Azure Data Factory FinOps team can now rejoice over having to delete all their old legacy scripts and start to leverage new ones. The sassiness gets higher as the episode goes along. [01:18:11] Speaker B: That's good. I don't think Fooocus required that they, they push the the contents to a storage bucket. But I know, I know Google, Amazon have done the same thing as well. [01:18:21] Speaker A: Yeah, it's more the old one was their own model and didn't support Fooocus and they're slowly retrofitting all the billing aspect to support focus which is great. They focus on the billing APIs in the past and now you can kind of see they're moving on to the consumption based APIs. Because us the consumption APIs were pretty terrible. You know I think couple like a year or two ago we did a analysis at my day job and we're trying to figure out savings plans amount if we buy X amount, how much do we need to buy everything along those lines and we definitely ran into like throttling issues and it just like bombed me out on us and at a few points and a lot of weird loops we had to do because the format just didn't make sense with modern stuff. So it's a great way I would suggest you move not because they're trying to get rid of it but because it will make your life better because you can point to a file and then ingest that file into whatever you want versus having to deal with an API. And if you have a large subscription and a large tenant you will save yourself a lot of headache. Just personal pain. [01:19:25] Speaker B: Unloading Right Azure WAF now includes capture challenge capabilities for front door deployments, allowing organizations to distinguish between legitimate users and automated bot traffic. [01:19:36] Speaker A: For now, it was might have been released on the day that front door went down and still can't use it then just calling that detail. [01:19:46] Speaker B: I have seen Claude for Webb click on a I am human button and pass just fine. So we'll see how quickly this feature gets updated. The capture feature integrates directly into Azure Front Door's WAF policy engine, enabling administrators to trigger challenges based on custom rules, rate limits or anomaly detection patterns. You can configure capture thresholds and exemptions without requiring changes to the backend application code. That's cool. I kind of wonder what we're going to replace recapture with. Honestly when AIs can click the button just like a person can. [01:20:21] Speaker A: You know there was some website I was on and I honestly don't remember where it was. It took me like five tries to prove I was human and I'm on a call with someone, I was like, it was like orient this image in the same way. And I was like I am. They look the same. But it turns out at one point I think I had it backwards like 180. But it's like these two blurry images. You're like, take this figment of a dog and this other dog and make them look the same way. I'm like, what am I doing right now with my life? [01:20:50] Speaker C: Yeah, they're so bad now. I got stuck on a website the other day because I couldn't pick out all the buses. And I was like, I know what a bus looks like. Why am I wrong? I don't. Does that edge of the bus count as a bus? Like it's a pixel. Two pixels. [01:21:04] Speaker B: I don't know. Yeah, I've seen more and more complex things like dragging a piece of a jigsaw into a circle spot where it, where it belongs. [01:21:12] Speaker A: It's. And it's time based too. Like it tells you you've completed this in 1.2 seconds. Almost like, is that good? What's my grade? Is that an A? Is it a pass fail? [01:21:22] Speaker C: All I can think is you can write a program that does all these and you, you could before. You can just do it really fast now. Like to me there's like no point in this anymore. It's just a waste of our time. Better IDing to human, real people in the real world to people using computer somehow without it being shady. [01:21:45] Speaker B: Yeah. I think the mostly the way they work is not whether you drag the right piece or not, whether you find the right bosses. It's the pattern in which you. [01:21:53] Speaker C: Yeah. [01:21:54] Speaker B: Click the buttons or you know, it gathers and move the mouse like the HID information from the mouse movements and stuff like that. [01:22:00] Speaker C: Yeah. [01:22:01] Speaker B: You know if they can. [01:22:03] Speaker C: You can fake it. [01:22:03] Speaker B: Oh yeah. If they can train an AI to recognize it, then you can train an AI to fake it just as well. [01:22:09] Speaker C: Absolutely. [01:22:10] Speaker A: And in the final article of today in preview, Instant access snapshots For Azure Premium SSD V2 and Ultra Disk storage. Another great way to burn money. Azure now offers instant access to snapshots in public Preview for Premier SSD v2 and and ultra disk, eliminating the traditional waiting time for the snapshot to restore. Previously, customers had to wait for snapshots to hydrate before restoring disk. This feature is allowed for immediate access when high performance snapshot. High performance is needed right after the snapshot is created. The capability addresses a critical operation need for enterprises running on these storage tiers. These normally is related to mission critical databases which could be SAP, HANA and other latency intensive applications where downtime during recovery operations directly impacts business operations and you can then compare this positions it very closely with the most recent update for the AWS EBS snapshot retrieval that we talked about last week. And it means that you can have a much better time, much better time for your point in time recovery or sorry, RTO recovery time objective. [01:23:23] Speaker B: Yeah, I mean I've seen very slow database restores using nature like hours. Like right, the backup takes 30 minutes, but to restore it takes nine hours. So hopefully by doing disk based snapshots, you've seriously reduced the amount of time it takes to recover from an issue like that. I assume they must have kind of built the smarts into the storage layer somehow. It kind of fakes the fact that the data isn't yet in the on the SSD and in fact is somewhere else and kind of injected in like almost in real time. So the performance isn't quite as fast as it will become when the back when it has completely rehydrated. But I guess you can use it in the meantime. [01:24:05] Speaker A: Yeah, one of the other use cases here, which isn't a terrible one, is if you need to create a read only replica of your SQL database. So you want to snapshot it, get it up quick, quickly so then you can link it over so you don't have a bunch of data to catch up on. You know, I think it's an interesting one that I didn't think of at the beginning. The other two notes once I kind of dove a little bit more into the article was around charging. You're going to be charged for storage, so build only for the additional storage consumption by the instance. Access based on data charges on the source disk after the snapshot is created and a restore charge, which is a one time fee applied to each disk restored from the Azure instance snapshot calculated based on disk provisioned space at the time of the restore. I mean there's definitely use cases for this type of stuff. I know I played with it on AWS back in the day when they did the fast restore feature for postgres upgrades. We were doing a bunch of postgres upgrades from like 9 to 14 or something crazy like that and we wanted to have that backup there in case we needed it. So we quickly cloned it, had it up and then did all the upgrades on the clone versus the primary. So we had a quick failback and there was, there was business reasons why we had to do it that way versus on the primary. It's like I get why these features exist just when you're doing it on these large volumes, which is what you always are. You gotta look at your build quickly, which I guess now you have a new consumption API to look at. [01:25:37] Speaker C: Sounds so expensive. [01:25:39] Speaker B: Yeah, I. I guess be really good for like a security sec. Ups kind of a. Kind of thing. You got a questionable machine and you can take an immediate snapshot of it, reconstruct it somewhere else and start doing forensics, even though the machine's still potentially up and serving customers. [01:25:56] Speaker A: But do you care at that point? Well, I guess security stuff. Maybe you do care because you want it to. You want to be able to access it quickly, you know. But is the little bit of a latency hit of doing the forensic analysis on the box worth the time? And if you ask a security person. Yeah, we don't care about spending money, we just do it. [01:26:19] Speaker C: Yeah. [01:26:19] Speaker B: Yeah. [01:26:20] Speaker C: But you'd want to shut that box down if you actually think there's a problem. You would. You would just shut it down the second you make that snapshot. [01:26:28] Speaker B: Yeah. [01:26:29] Speaker A: Well, you would leave it running in some cases because you want the memory. You access to the memory, but you would kill its network traffic ability. [01:26:36] Speaker C: Yes. Yeah, exactly. [01:26:38] Speaker B: I see. That's one of my only feature requests to Amazon that never got built. And that was for snapshots to include contents of running RAM so that we could do forensic analysis on images taken of running machines. I never did that. [01:26:53] Speaker A: They have it. So you can power it off. [01:26:56] Speaker B: Yep. [01:26:57] Speaker A: And keep memory there, but not the. Take an Amy of it in that way. So maybe that should be your reinvent. Announce. Reinvents. [01:27:07] Speaker B: I. I highly doubt it because really what I wanted was a copy of what? Yeah, I wanted a copy of the. [01:27:13] Speaker A: You want a full running of exactly everything. [01:27:15] Speaker B: Yeah, I want to. I want a copy of that. I want to dump with the current state of ram. But the security. I mean, yes, it could be a great security feature, but at the same time, you mess up that kind of configuration and attacker's got access to all your secrets. [01:27:30] Speaker A: Yeah. Everything that's in memory. You're having a bad day. [01:27:33] Speaker B: Yeah. In fact, they built the opposite. Really. Which are these secure enclaves which, you know, use the encrypted. Encrypted RAM on the hypervisor. And so there's no chance you're going to get that out because the key doesn't exist anywhere but on the chip itself. [01:27:46] Speaker A: Yeah. [01:27:47] Speaker B: All right. We have reached the end. [01:27:50] Speaker A: Thank you, Elise, for joining us. [01:27:52] Speaker C: Thanks for having me. [01:27:53] Speaker B: You should come back more often. [01:27:56] Speaker C: What is it? 304 episodes. 305 episodes. Be back in a few years about. [01:28:02] Speaker A: 50 episodes a year. See you in six years. [01:28:05] Speaker C: Oh, yes. [01:28:07] Speaker B: Yeah. [01:28:11] Speaker C: Maybe before then. [01:28:14] Speaker B: That would be great. Thanks, Elise. Thanks, Matt. See you later. [01:28:17] Speaker A: Thank you, everyone. Have a great night. [01:28:19] Speaker C: Bye. [01:28:19] Speaker A: Bye. [01:28:22] Speaker B: And that's all for this week in Cloud. We'd like to thank our sponsor, Archera. Be sure to click the link in our show notes to learn more about their services. While you're at it, head over to our [email protected] where you can subscribe to our newsletter, join our Slack community, send us your feedback, and ask any questions you might have. Thanks for listening, and we'll catch you on the next episode.

Show Notes

Titles we almost went with this week

Follow Up

AI Is Going Great – Or How ML Makes Money

Cloud Tools

AWS

GCP

Azure

Closing

Chapters

Episode Transcript

Other Episodes

Episode 222

222: Even AWS is Hit by Inflation, and is Passing that on to you – the Customer

Episode 176

176: The Cloud Pod Earnings Continue To Be Steady

Episode

Episode 22: The Cloud Pod Increases listener limit to 1 million