302: It’s So Hot, Even Windows is Hotpatching

Episode 302 May 08, 2025 01:32:41
302: It’s So Hot, Even Windows is Hotpatching
tcp.fm
302: It’s So Hot, Even Windows is Hotpatching

May 08 2025 | 01:32:41

/

Hosted By

Jonathan Baker Justin Brodley Matthew Kohn Ryan Lucas

Show Notes

Welcome to episode 302 of The Cloud Pod – where the forecast is always cloudy! This week Justin and Ryan are on hand to bring you all the latest in Cloud (and AI news.) We’ve got hotpatching, Project Greenland, and a rollback of GPT-4.o, which sort of makes us sad – and our egos are definitely less stroked. Plus Saas, containers, and outposts – all of this and more. Thanks for joining us in the cloud! 

Titles we almost went with this week:

A big thanks to this week’s sponsor:

We’re sponsorless! Want to get your brand, company, or service in front of a very enthusiastic group of cloud news seekers? You’ve come to the right place! Send us an email or hit us up on our Slack channel for more info. 

General News 

01:37 Sharing new DORA research for gen AI in software development

045:06 Ryan – “Those are really good approaches, but really difficult to implement in practice. You know, in my day job, watching the company struggle to get a handle on AI from all the different angles you need to, from data protection, legal liability – just operationally – it’s very hard. So I think having a mature program where you’re rolling that out with intent and being very specific with your AI tasks I think will go a long way with a lot of companies.”  

AI Is Going Great – Or How ML Makes Its Money 

08:55 Introducing our latest image generation model in the API

09:47 Ryan – “It’s still tricky pricing these things out…forecasting these things in a way that you can coordinate as a business is really challenging.”

12:03 OpenAI rolls back update that made ChatGPT a sycophantic mess

Cloud Tools 

14:30 Targeted by 20.5 million DDoS attacks, up 358% year-over-year: Cloudflare’s 2025 Q1 DDoS Threat Report 

15:57 Justin – “I was thinking about this earlier, actually. Typically DDoS attacks are compromised computers that are then used in these massive attacks, and they’re all controlled by botnets and this has been going on for over a decade now – and it just keeps getting worse… I mean, I’m a computer guy, so all my shit’s locked down and secure, and I have firewalls, but do normal people just go raw dogging on the internet and their computers get hacked and compromised all the time?” 

AWS

20:09 In the works – New Availability Zone in Maryland for US East (Northern Virginia) Region

25:40 Enhance real-time applications with AWS AppSync Events data source integrations  

26:45 Ryan – “I kind of like this thing because it’s a little bit of putting a Band-Aid on your around your managed application, but sure is powerful when you can use it.”

29:49 Amazon EKS introduces node monitoring and auto repair capabilities

32:29 Ryan – “I do like that it’s built in to the existing agent, you know, in terms of those health checks. And hopefully that the thresholds and the tuning of this is, you know, tunable where you can set it. Or it’s just completely like hands off running and it just works like magic. That would also be acceptable.”

33:42 Prompt Optimization in Amazon Bedrock now generally available

34:22 Justin – “This is one of those things you create the prompts, you optimize them once for each of the models, and they don’t really change all that often. That’s the guidelines that change.”

36:20 AWS announces upgrades to Amazon Q Business integrations for M365 Word and Outlook

38:42 Announcing Serverless Reservations, a new discounted pricing option for Amazon Redshift Serverless

39:06 Justin – “Save all the monies!” 

39:37 AWS Transfer Family introduces Terraform module for deploying SFTP server endpoints

39:57 Justin – “If you’re using FTP you should stop immediately.” 

42:10 Introducing a guided visual pipeline builder for Amazon OpenSearch Ingestion      

43:02 Justin – “All of Ryan’s grey hair in his goatee and the reason why I have no color in my goatee is because of ElasticSearch.” 

44:35 Announcing second-generation AWS Outposts racks with breakthrough performance and scalability on-premises

45:40 Justin – “You know what this announcement doesn’t say a thousand times? No AI. Not a single mention of it. They did mention inference for a variety of ML models, and they do specifically call out CPU based ML models, and that’s because none of these instances support GPUs yet…but they do promise that they are coming soon – both the latest generation EC2 and GPU enabled instances.”

48:16 Reduce your operational overhead today with Amazon CloudFront SaaS Manager

50:05 Justin – “So now you have a very complicated set of CloudFront configurations because every one of them has to have its own CloudFront configuration – because you did custom URL vanity URLs. But now you can use this to help you make that less toil, which is appreciated, but it’s also a *terrible* model. And I don’t recommend it for a SaaS application if you can help it.”

52:22 Amazon Route 53 Profiles now supports VPC endpoints

GCP

53:56 Introducing SaaS Runtime

55:33 Justin – “This is for a SaaS company that literally deploys an instance for each customer. It’s an expensive pattern number one, but sometimes customers like this, because it makes it very easy to say, well, these are your direct costs, and so you should pay for them. This is a model that Jira uses. This is the model that ServiceNow uses – where you’re getting a dedicated app server in addition to a dedicated database server. And so yeah – this is to manage all of that at scale… But this really isn’t how you should do it.” 

1:03:49 Google Cloud Database and LangChain integrations support Go, Java, and JavaScript

Azure

1:04:20 Unveiling GPT-image-1: Rising to new heights with image generation in Azure AI Foundry

1:06:16 Tired of all the restarts? Get hotpatching for Windows Server

1:07:57 Ryan – “I hope that there’s a technical reason, because it feels like a cash grab. On one hand, I get it – they’re solving operational problems they have by managing their workloads on Azure, and this is an enhancement that comes directly out of managing servers with that scale, which is fantastic. The fact that they put it as a subscription on Arc makes me feel a little dirty about it.”

1:13:53 Announcing preview for the next generation of Azure Intel® TDX Confidential VMs

1:17:09 Announcing Public Preview of Larger Container Sizes on Azure Container Instances

1:18:09 Ryan – “I’m just surprised they got away with it for as long as they did. Because I went on the same journey you did, which was to point and laugh – they only have four? Cause I’ve never seen a workload need more than four CPUs, but everyone asked for more than four.”

Other Clouds

1:19:47 Introducing DigitalOcean Managed Caching for Valkey, The New Evolution  of Managed Caching 

1:20:11 Ryan – “I like to hear DigitalOcean coming up with these managed services. And so if you have a workload on DigitalOcean you don’t have to manage your own service offering on compute. You can take advantage of these things. It’s great. I’d like to see more competition in this marketplace.”

Cloud Journey

1:20:50 ‘Project Greenland’: How Amazon Overcame a GPU Crunch

Closing

And that is the week in the cloud! Visit our website, the home of the Cloud Pod where you can join our newsletter, slack team, send feedback or ask questions at theCloud Pod.net or tweet at us with hashtag #theCloudPod

Chapters

View Full Transcript

Episode Transcript

[00:00:00] Speaker A: Foreign. [00:00:08] Speaker B: Where the forecast is always cloudy. We talk weekly about all things aws, GCP and Azure. [00:00:14] Speaker C: We are your hosts, Justin, Jonathan, Ryan and Matthew. [00:00:18] Speaker A: Episode 302 recorded for April 28, 2025. It's so hot, even Windows is hot. Patching. Good evening, Ryan. How you doing? [00:00:27] Speaker C: Doing well, Doing well. Good. [00:00:29] Speaker A: We. It's just two of us tonight. Matt's. Matt's got good news he'll share with us in a future episode. And Jonathan's still out, unfortunately, but we're hoping to get him back at some point. He's just out for un undisclosed amount of time. We'll see you when he's back. But it's always good to have you, Ryan. We missed out last week. We. We reminisced a little bit about our 300 episodes and so, you know, I tried to use AI to help us reminisce and it failed me terribly because the context window is just not big enough for the show note files that we have. I mean, it's. I probably need like 10 billion tokens to actually have it process properly our show notes to actually do anything that I would want to do with it. So I have outsourced a blog post about our 300 episodes to someone to write their perspective on us. So we'll see how that turns out. [00:01:17] Speaker C: Oh, that's kind of cool. [00:01:18] Speaker A: Yeah. Yeah, we'll either enjoy it or we'll be cringing. Ye. [00:01:23] Speaker C: Or it'll be a short, short run of the podcast. [00:01:25] Speaker A: Yeah, like. Well, maybe we should just throw in the towel. Might be the right answer. So take a look. That'll probably, hopefully be on the website in the next week or so so that when we post it in our blog section. But we have news, as always here at the Cloud Pod because the clouds never stop. They never sleep. I don't know if you know about clouds. They just circle the globe releasing features and data center. It was Oracle planting new data center, you know, sprouts. That's just how they work. Well, first up, Dora has some new research for Genai in software development. It's a very detailed report. I did skim through it. I need to spend some more quality time with the heavy details of it, but the executive TLDR that was in the blog post announcing the report we'll talk about. And then if you're interested, you should definitely check out the full report. The report is based on data and developer interviews, and the reports aims to move beyond hype to offer true perspective on AI impact on individuals, teams and organizations. AI is real. They found. I mean, I would hope so at this point. Oh, I did. I did the show note thing. Did tell me when I was doing AI that, you know, we primarily only talked about AWS, GCP until 2023 and then we definitely started adding a lot more AI. It did pick up on that. I was like, well, that makes sense. [00:02:38] Speaker C: Yes. [00:02:39] Speaker A: Yeah. So even for the podcast, AI became very real in 2023 that we had to start talking about it, including the fact that we added a whole section called ML is AI's way of making Money. And that has been going with us now for quite a while. Or, sorry, AI is ML's way of making Money. I read the show notes and I wrote it backwards because dyslexia. Anyways, AI is real. A staggering 89% of organizations are prioritizing the integration of AI into their application. And 76% of technologists are already using AI in some part of their daily work. Productivity gains are being confirmed. Developers using Gen AI report significant increases in flow productivity and job satisfaction. And for instance, a 25% increase in AI adoption is associated with a 2.1 increase in individual productivity, which, you know, sounds like a small percentage, but when you look at the cost of labor, it's a very big number. Organization's benefit Benefits are tangible beyond individual gains, Dora found strong correlations between AI adoption and improvements in crucial organizational metrics. A 25% increase in AI adoption associated with increase in document quality, code quality, code review speeds, and approval speeds. And if you're looking to use AI in your development organizations, they give you five practical approaches for both leaders and practitioners to think about as you move AI into your organization. So first of all, have transparent communications about the use of AI, the purpose of AI, how you might use it, solutions it can do. Empower developers with learning and experimentation options. Establish clear policies and guidelines for the use of AI, especially in production code. Rethink performance metrics around, you know, maybe not count how many lines of code you wrote, but how many quality lines of code you wrote and how often you're using AI to help you do that faster and then embrace fast feedback loops is their last recommendation. [00:04:19] Speaker C: In general, those are really good approaches, but really difficult to implement in practice. Like, as you know, in day job, watching the company kind of struggle to get a handle on AI from all the different angles that you need to from data protection, from legal liability, from just operationally, it's very hard. And so like I think having a mature program where you're rolling that out with intent and being very, very Specific with your AI tasks I think would be go a long way for a lot of companies because there is a huge improvement. People are going to use the tool whether you let them or not. They'll find a way. And so like it's, it's not practical to say, oh no, our AI policy is that we only use this one sanctioned AI tool. If that one sanctioned tool isn't good enough, people are. It's easy enough to find alternatives and there it's even easier to dump a whole bunch of data into, into that for context to try to get a good answer out of it. And where's that data go? Is it just so it's, it's one of those things where I really think that companies need to better job because I do think that AI is real. It's here and it's the impact it has, just the impact it's had on my daily productivity. It's not so. [00:05:37] Speaker A: I mean, I don't even code in the day job very much. I mean, I do code reviews occasionally and I look at systems architectures and I do more managerial things. But at home, I do a lot of home coding for my own hacky projects. And it's made me do more hacky projects because I was like, oh yeah. I mean, I know how much it's going to take to do that. And I would laugh at one of our friends who's making his own sprinkler system control system and using a Raspberry PI to control it. I'm like, oh my God, the amount of time you just buy that off the shelf, which I did from a vendor versus trying to hack. But now I'd be like, maybe it'd be fun to hack that together because I don't have to do all of the work of it. I can do the work that I enjoy of coding and I don't have to do the part I don't enjoy, which is make that super perfect and well documented and things. And so yeah, I mean, even at home in my own side projects, I use it a lot, which is fun. [00:06:30] Speaker C: Yeah. I mean, it helps me finish projects like that last mile, like being able to write documentations and tests and configure all this stuff that all the stuff that's the least fun, it's not the desired part of the project. Once I've solved the problem, I lose interest really quick, quickly. So you have to capitalize. [00:06:49] Speaker A: I can't tell you how many readmes I've now created using Claude. It's like I'm going to generate a readme for the thing you asked me to do, and I'm like, oh, thank you. I would not have thought to do that. And funny enough, there was something I did a month ago that wasn't quite working the way I thought it was going to do. And so I was looking at the code, and I was like, I don't really remember what I did here. And there's a readme. And I clicked on it. I read. I was like, oh, yeah, this is what I asked it to do. And I. Doesn't make sense. Oh, yeah. I now know why it's broken. Okay. And, like, what I was trying to accomplish because I forgot what I was doing a month ago. You know, it was true of my code all the time. Because I look at code I wrote 10 years ago, and I'm like, oh, my God. Hey, I was a terrible coder. And number two, I don't know what I was trying to do with this code. Yeah. [00:07:25] Speaker C: Yeah. Six months ago. Ryan doesn't like, what the hell? You know, like, yeah, it's so much easier. [00:07:31] Speaker A: Yeah. I sort of laugh when our current president complains about trade deals that he did that he did in his first administration. And I'm kind of like, I mean, it could happen to me. I could see it happening. I wrote, like, someone could blame me, like, why do you write this code? And I'd be like, I don't recognize. [00:07:43] Speaker C: The code at all. [00:07:45] Speaker A: I wrote that. He's like, according to the git, commit surprised me. Clearly, I was high. Or, that's a brilliant code. I didn't expect that from me. So one of the two, Definitely. Yeah. It only goes one way. Typically, I'm a terrible coder, and so that's why I maybe moved into management, because I'm really good at architecture and system design. But, yeah, you want me to code that up? I'm like, I can do it. Last time I did a big coding project was at our last company together, Ryan. And Ryan laughed at my code every time he saw it. He's like, your code is terrible. [00:08:17] Speaker C: And I was like. Which is mean. Just because I've seen my code and it's not. I don't really have a whole lot of legs to stand on. [00:08:23] Speaker A: I'm not an elegant coder, but it was a pretty weak terraform at the time because I didn't know modules, so it was a lot of repeated blocks of text. Yeah. You were like, this is bad. I'm like, it works, though. And you're like, yes, it does. Sh it to Prod. [00:08:39] Speaker C: It's, it's come full circle. I've, I'm pushing code today in my day job. That's I'm pretty sure is subpar considering the rest of my team who's been much more in depth with Terraform. [00:08:50] Speaker A: I mean that's the rally. If you're a team leader, you end up hiring good people who write good code and you know, you embrace them and help support them. It's a good manager, a good leader. You know, we can build that team who's better than they are and it's hard to find. All right, let's move to how AI is or ML is making money with AI. Introduce our latest image generation model and API. This is from OpenAI. They have terrible blog post names like yeah, they're making a lot of assumptions about SEO. Yeah. So OpenAI released in the API GPT image one which enables developers and businesses to easily integrate high quality, professional grade image generation directly in their tools and platforms. This is the thing that powers things like Adobe Magic that makes people disappear out of your photos. The GPT Image 1 API is price per token with a separate pricing for text and image tokens. Text input is $5 per 1 million tokens. Image input tokens are $10 for 1 million tokens. And image output or generated image is $40 per 1 million tokens. Which sounds like a lot of tokens, but based on what I've seen with text, I'm gonna say that's not a lot of tokens. Pricey pretty quickly. So do keep that in mind. [00:09:57] Speaker C: Yeah, it's still tricky pricing these things. [00:10:00] Speaker A: Out because it is still especially with image. I sort of have a feel for text now, but it's not a good feel. But I have a sort of a guesstimate but images because again, how it even processes I don't understand. [00:10:14] Speaker C: I imagine it's the same if you're doing it daily. You do get sort of a feel for what type of task. But I mean it's forecasting these things in a way that you can sort of coordinate as a business. It's really challenging but you know, like it is. I do like the sort of pricing model that's being rolled out. And so once you do get a hold of it, I do think that you'll sort of understand loosely and be able to sort of at least be able to put guardrails on it. So you can't spend a million dollars in 15 minutes. [00:10:43] Speaker A: Yeah, I was trying to make some, some troll videos, you know, to troll one of my Friends and I use Google View too. And I wrote my prompt and I said, you know, this is what I want and blah, blah, blah. And it created me four images or four six second videos. And then I didn't like what it came up with, so I made some tweaks in my prompt and I did it again. I made four more six second videos and I was like, that's good enough. And then I got a notice that my bill, my budget had been exceeded on my Google account. And I was like, oh. And I went and looked at it. That, that little, those 86 images cost me $45. [00:11:20] Speaker C: Yeah. [00:11:22] Speaker A: I was like, oh, this is not a tool for me. Check. [00:11:25] Speaker C: Yeah, yeah. [00:11:27] Speaker A: This is a tool for photo filmographers who do real movies and real things like Wizard V and proper budgets. Yeah. This is not a experimentation toy thing for me. Yeah. Take that as a, a vein of a warning because we said how cool V2 was, but then I learned how expensive V2 was and I less and less intrigued. I mean, it's still cool. It just, it's. [00:11:48] Speaker C: Yeah, it's very cool. I did the same thing was I generated one video and I was like, oh, that's, that's. And you know, it was like 13 or something like that. I was like, that's too much. [00:11:58] Speaker A: Yeah, I mean, like doing image generation to create like, you know, memes or funny pictures. We're talking about. Totally worth it. It's worth 32 cents to me. It was $30. I'd be like, yeah, that's not worth it to me. I'll find something on the Internet someone else generated. So definitely one of the things apparently OpenAI is rolling back in GPT 4.0. Some changes it made recently that made it a syncophantic mess, which I had to look that up means that you're super, super overly complimentary to you. So apparently it was sucking up to you, basically. So if you thought your coding was really good, which you did just allude to me a little while ago that you thought your coding had turned out really good, which I'm sure is because chatgpt told you so. [00:12:37] Speaker C: You know, maybe so. Maybe. Yeah. [00:12:40] Speaker A: And so basically they're saying that, you know, as you interact with the chatbot, OpenAI gathers data on the responses people like more. And this is a little thumbs up, thumbs down thing. You get on the responses and then the engineers use that data to, of course, revise the production model using a technique called reinforcement learning from human feedback. But if you're only plussing the stuff that's like, really complimentary. That then potentially puts a bias into your model. And so that's where it went off the rails. So basically turn ChatGPT into the world's biggest suckup. Users can present ChatGPT with completely terrible ideas or misguided claims and it might respond, wow, you're a genius. Or this is on a whole different level. Designing the model's tone is important to making them something you want to chat with. So it's important to have some human feedback. But if you get into syncophantic, it becomes a toxic feedback loop, which results in things like terrible tariffs based on bad information, potentially. Just going to put it out there. [00:13:30] Speaker C: Yeah, it happened. I can't agree. I haven't used GP 4.0 or 4.0, but even the models I have used, like, there's. They all still have a little bit of that tone. Like, especially when you call it out on a mistake. Oh, you're absolutely right. Good find. You know that kind of. And I hate it. Right. Like now I distrust the second thing you're giving me in response to that so much more because it feels like you just lied to me. And so, like, yeah, let's start adding, you know, context to all the prompts, you know, for session state, where it's like, you don't need to, you know, little smoke up my dress. I'm good. We could just have a real conversation. [00:14:12] Speaker A: Yeah, I mean, I, I like my ego stroke. I'm not going to deny it. Just, I know I'm not a genius, so let's not, let's not get carried away. They're CHAT gbt, so, yeah, I'm so glad they're fixing it, but I kind of miss. I'm kind of sad that I don't use CHAT GPT more than I wasn't getting complimented more because Claude doesn't have the same opinions about me as ChatGPT does. He's. I mean, he's. He's honest and realistic. Like, your code is bad and I. To make it better. That's fine. All right. Well, Cloud flare has their Q1 DOS threat report out. Basically said that it was relatively flat until April. And so April was so bad that they decided to pull in a mention about it into the Q1 report. They said basically, in April, they are blocking an intense packet rate attack, peaking at 4.8 billion packets per second, 52% higher than the previous benchmark, and also defended against a 6.5 terabits per second flood matching the highest bandwidth reports ever reported, which that's crazy, that just happened in April, like weeks ago. And more, you know, benign news. For the first quarter they blocked 20.5 million DoS attacks only a 358% year over year increase and 198% quarter over quarter increase, which they called relatively tame. So I mean, one third of the attack, 6.6 million, targeted the Cloudflare network directly as part of an 18 day multi vector attack campaign. And furthermore, in the first quarter, 2025, Cloudflare blocked approximately 700 hypervolumetric DDoS attacks exceeded 1 TB per second, or 1 BPSs, which I don't know what that stands for. Yeah, or about eight attacks per day. He said in the article. And this is the trans. When I take a note and I put it in here, I should probably put the full thing because I said at the time, I'll remember that. And I did not. That's a billion packets per second. I found it. That's 1 billion packets per second. [00:15:55] Speaker C: I can't even imagine know, like, how is this sustainable? Like it seems, because every year they seem to say like, oh, it's like a 300 or you know, year over year increase, 500% year over year increase. [00:16:07] Speaker A: I was thinking about this earlier actually. It's funny you bring this up because apparently we're on the same person, but I was like, you know, typically DDoS attacks are, are compromised computers that, you know, are then used in these massive attacks and they're all controlled by botnets. And like this has been going on for over a decade now and it just keeps getting worse. Like, are this many systems getting compromised on a regular basis, that these networks are getting this much bigger to be able to push this much more traffic through? Like, is that what's actually happening out in the wild world, that that many systems are getting attacked? I mean, I'm, I'm a computer guy, so all my shit's locked down and secure and I have firewalls and like just normal people just go raw dogging on the Internet and their computers get hacked and compromised all the time. Like, what's happening? [00:16:49] Speaker C: I mean, I, I think it, I think it's both of those things really, because I do think that, you know, with IoT devices it's very, those are very frequently just put on the network. And you know, most consumer home networks default to having sort of a NAT out, that kind of thing. But it's still, you know, they can still reach the Internet, they can still be compromised through, you know, vulnerabilities. And there's A billion more of those than there used to be. Like, you can, you know, my fridge now is on my WI fi. That. That's not required, you know, but it's there because. It's. Because I'm a dork and I wanted to see what it would do. [00:17:24] Speaker A: Yeah, I mean, I definitely know. I've. Back a long time ago, I worked at a company where if you put a computer on the network unpatched, it would be compromised with malware within 25 minutes, which I thought was terrible. And at the time, no one at that company's security department seemed to think it was as terrible as I did, which is why I was not sure I wanted to work there, because I was like, this is really bad. Like, this is our internal network. This should not be happening. That was back when security was not a thing, apparently. Or not as important. But again, comparing what they said about that April attack. So they said 700 attacks of 1 terabytes per second, or 1 billion packets per second. 700 individual attacks of that size or greater. And then this one attack in April was 4.8 billion packets, which is 3.8 billion packets more than 1 billion, and 6.5 terabytes, which is basically 5.5 terabytes more then. So that. That is a pretty massive attack. [00:18:16] Speaker C: That's huge. Yeah. [00:18:17] Speaker A: Can't wait to see what the rest of Q2 looks like for Cloudflare. [00:18:21] Speaker C: Yeah, just. I mean, the. The amount of marshaling resources for these. These campaigns must be just crazy. [00:18:27] Speaker A: Like, I mean, and the thing that's, you know, we use Cloudflare for the cloud pod. Like, I set it up one day, and I was like, 30 bucks a month. This is a steal of a deal. Because I was tired of dealing with cloudfront and Amazon waf, which wasn't very good at the time, and I needed something better. And, you know, I've just. I'm like, for 30 bucks a month, I'm getting all this protection. Like, it is a good value. Like, they should sponsor us, but. [00:18:48] Speaker C: Yeah, I've reached you. [00:18:50] Speaker A: Yeah, I'm definitely impressed with what they're. They're able to do. And, you know, like, I know we use it at the day job. I think we spend more than 30amonth. I'm pretty sure we do, but, you know, I mean, it's very affordable for what this is. I remember back when, like, you were trying to buy DDoS protection, you know, 10 years ago. Like, you're talking millions of dollars and dedicated circuits. You could spend all your traffic around. [00:19:09] Speaker C: And, like, oh, God, the early days of Akamai. [00:19:11] Speaker A: No. Oh, yeah. It's amazing what Cloudflare is doing. So I'm very impressive to them. Kudos. And for $30 a month, I'm really happy with what they provide for the Cloudflack, which is peace of mind so I can sleep at night. And our website is being hacked. [00:19:24] Speaker C: Yeah. [00:19:24] Speaker A: Because it does run WordPress, which is its own canary in the coal mine. That thing is patched regularly. [00:19:33] Speaker C: Yeah. And it is. There's so much. Any kind of managed service like that. This is where you want the value, you want the managed rule sets and the protections to be built in and not something you have to manually apply and to meter that value between the rules applied and the performance loss you get as you apply more rules and all those things. But it's overall. Yeah, you just plug it in. It's nice. [00:19:58] Speaker A: 100%. Yeah. So if you're looking for a lightweight DDoS solution, I recommend Cloudflare. Please sponsor us. [00:20:04] Speaker C: Yes. Not a sponsor yet. [00:20:07] Speaker A: Someday, hopefully. [00:20:08] Speaker C: Yeah. [00:20:09] Speaker A: Moving on to aws. So the mystery has been solved. Ryan, we talked a few episodes ago about the fact they were adding location in the response to the API and we were sort of like, that's sort of weird when it's all Virginia and it's all Ohio and it's all Oregon. And today or this week they announced that they're announcing the sixth, or, sorry, seventh Availability Zone in the US east region, which will not be built in Virginia, it'll be built in Maryland. So now this will be the seventh zone connected by high bandwidth low latency network connections within 60 miles or 60 kilometers of of the US East North Virginia data centers. And this, this AZ joining an ever growing list of new regions they're building out, including New Zealand KSA, or Kingdom of Saudi Arabia, Taiwan and aws, European Sovereign Cloud. Just proving to you that AWS is investing very heavily in data center capacity, despite illusions that they are lowering their leasing on data center colos, which they do all the time. So there's a news story this week, they were, you know, growth was slowing and they weren't getting as many data releases. I'm like, yeah, they do this all the time. They make adjustments, new technology comes out, they consolidate. They don't need as much COLO in this place anymore over there. They adjust these contracts all the time. [00:21:19] Speaker C: Yeah, I'm going to offset that with the news. They're trying to buy their own nuclear reactor. [00:21:24] Speaker A: Exactly. [00:21:27] Speaker C: They're probably still reaching for space. Yeah. Yeah, it is interesting. [00:21:31] Speaker A: It does make Sense though why the API had to get updated to now include additional details of where the data center actually is. Because now Northern Virginia is no longer a good enough. You should just name the U.S. east. And now you drop the Northern Virginia part and you see there in Virginia or it's in Maryland if you have, you know, if Maryland or Virginia has laws that are impactful to your business potentially if you're online gambling site, this could become a very important fact. I am sort of curious about. You know they typically randomize your US East 1 availability zones. And so like are you just gonna get randomized into a Maryland Availability zone as you sign up a new account that might have some impact to you, I imagine. [00:22:05] Speaker C: Right, like. And. Oh yeah, because you don't, you know, you have control over regions, but yeah, I don't know if you have control of which azs. [00:22:14] Speaker A: Well, I mean you, you can definitely choose what azs you want to use, but they, they randomize the labeling because what was happening was everyone was choosing, you know, US East 1A because it's the first one in the list. So now they randomize that so you still could choose. And there's an id, there's a identifier, I don't remember what it's called that can tell you which one it actually is if you need to know. Because if you have multiple Amazon accounts, as many people do, that can become a problem. If you think your US East 1A is their US East 1A and it's not, that could be a problem. So they, they fixed that a long time ago. And I assume this is something you know, you'll want to be aware of as you now spin up new regions. [00:22:48] Speaker C: There's yeah, if that's a concern, you. [00:22:50] Speaker A: Shouldn'T put it in US East 1 because that's a tire fire region. But if you need to put it there, then yes. Okay, you should know this. [00:22:57] Speaker C: I mean the fact that they're building a new AZ and they've run out of space in Virginia that they're moving to a whole different state just makes me think that people are still using U.S. tire firewall. [00:23:06] Speaker A: Yeah, I'm going to say they are. Yeah. Yeah, for sure. You have to wonder like how long has the first AZ been in in that region now? And like how old and like how old is some of the equipment in that? You know, like do they like I kind of be curious where for a company that has had been on Amazon for that long, that's like they're reaching out to me and saying like hey, you're on a re. An instance type you really need to get off of because you've been running it for nine years and it's going to die any moment. [00:23:33] Speaker C: Yeah, I mean, I've done, you know, large data center footprints where I've had to sort of manage that infrastructure, but not cloud hypervisor scale where I have a customer directly. Like, it's more like an internal sort of thing. And it can be hard even at that, you know, with that direct level of communication. [00:23:50] Speaker A: I mean, they're guaranteeing you a CPU architecture. Right. So it's not like they could just V. Motion you around to another like, oh, this rack's going to get decommissioned for favor of a new one. We're going to be motion you over. No, because like you, you said that you have, you know, C3 instance type with this CPU. That's the contract you're paying for. You can't just change it under the hood. [00:24:10] Speaker C: Yeah, no, I mean, it's, it's crazy. I wonder, you know, they must, I'm sure they must. The account team must reach out proactively and work with those customers because it's, it's bound to have already happened like a few times. [00:24:20] Speaker A: It's sad to have happened. I'm sure someone could write to us, oh, it happened to my company and this is what we have to do. And I'm sure they, you know, at some point you're going to have to say US East 1A has to be remodeled or, you know, changed around or rebuilt for a different specification for the newer stuff. So I. Does it have the power density that you need to run the highest and greatest GPUs today? Probably not. There's all these factors you got to think about that must be involved. Yeah. [00:24:45] Speaker C: And some of that you can abstract away because there are different sections within a data center that you can sort of partition out. Like there's several rooms or floors, or you can upgrade them one at a time maybe and sort of manage capacity that way. [00:24:58] Speaker A: Yeah, but if you're in a colo and you had one phase power and now you need three phase power, that's not just a, you know, data center section problem. That's a. Oh, we need to like redo the entire bus system into the data center and all of the power grid and redo all of the equipment, all the AC units and all the things, even be able to offer three phase power. Like, like there's gotchas. [00:25:19] Speaker C: And it's incredibly challenging. And it's, it's a, it's a really hard job. [00:25:24] Speaker A: It is. I have lots of respect for people who plan data centers and like imagine their life has not gotten easier in the hyperscale world, especially with GPUs the way they've been demanded. [00:25:33] Speaker C: Oh yeah. As power hungry and dense as they are. [00:25:35] Speaker A: Yeah, yeah. And the amount of heat they generate, like I'm sure it's not been an easy few years. All right, well, you can enhance your real time Applications with AWS AppSync events Data source integrations AWS AppSync events now support data source integrations for channel namespaces, enabling developers to create more sophisticated real time apps. With the new capabilities, you can associate AWS lambda functions, Amazon DynamoDB tables, Amazon Aurora databases and other data sources with channel namespace handlers. And leveraging AppSync's events, you can build rich real time applications with features like data validation, event transformation and persistent storage of events. You integrate these event workflows by transforming and filtering events using lambda functions or save batches of events. DynamoDB use the new AppSync JS batch utilities. [00:26:17] Speaker C: This is kind of neat. AppSync is one of those services that's not built for me, so I have a hard time being fair about it. I just see limitations of it. But I imagine if, you know, trying to think back like to where some of those limitations are like this probably could answer a lot of those things that you used to be stopped by, you know, you know, in Beanstalk or some of the other other ones. So this, you know, like I kind of like this thing because you it's a little bit of a putting a band aid on your around your managed application, but sure is powerful when you can use it. [00:26:51] Speaker A: Yeah, I always like it when you tell me these things like oh, this isn't something I would ever use. And then like I'm just ringing through the documentation. There's like a lot of mentions of GraphQL which I know you have told me about how excited you are about the idea of GraphQL and I'm like I wonder how that gets to Ryan at some point. So the reality is more eventing we get into, the more of these things maybe do make sense. You just don't know yet. Doesn't fit your use case today. But someday there'll be a product manager who comes to you and goes I need this thing. And you're going to go aha, this is my use case for AppSync. [00:27:21] Speaker C: I mean it's, you know, I'm. Now I'm worried I have AppSync wrong. But I thought it was more of a like a application manager type service. But maybe I'm wrong. I think I am wrong. [00:27:34] Speaker A: Well, I mean, so it's for. It's basically, you know, if I go to the website because I want. [00:27:40] Speaker C: Which I'm doing right now too, which is like. [00:27:42] Speaker A: Yeah, so AppSync Events is a easy publish and subscribe to real time data updates and events like live sports scores and stats, group chat messages, price and inventory level changes or location and schedule updates without having to Deploy and manage WebSocket infrastructure. It simplifies pub sub channels via the API by simply naming their event API and defining its default authorization mode and channel namespaces. And that's it. You can then immediately begin publishing events of channels that they define at runtime. And it also has an event handler which allow. Which are optional but can be used by developers to transform events as they are published and to perform events authorization logic on publisher subscribed connection requests. So this is a higher level service on top of something like Cognito or. Sorry, not Cognito, Cloud. Kinesis. That's what I'm thinking of. Or others like that as well. And then they have GraphQL, which basically gives you a GraphQL language on top of the things you can query. Use client apps to fetch, change and subscribe to data from the servers, generating the data into the event stream. In that particular use case, it's a higher level value version of Kinesis and. Or other products. [00:28:47] Speaker C: Yeah, I mean it's primarily geared for front end, which is why I think I'm less familiar with it. Yeah, I mean if you're building like a. Yeah. A web app where you like have some very frequently fresh updating data, you know, like stock scores or sports or something along those lines, seems like it makes a lot of sense. So it's kind of neat. I wonder what service I was thinking of and I wonder how much of that is. AWS has too many services now. I can't keep them on track. [00:29:19] Speaker A: Yeah, I know. And some of them are named very, very close to others that are the same name. So it's definitely a challenge to keep on top of it. But I was having a conversation with somebody and we were talking about an S3 feature and I was like, oh, what about this thing? And he's like, yeah, we did try that. And it worked this way. And I was like, okay, I still got it. [00:29:34] Speaker C: Yeah, nice. [00:29:37] Speaker A: My memory isn't completely losing it. I do remember some of these things. So just. Good. [00:29:44] Speaker B: There are a lot of cloud cost management tools out there, but only Archera provides cloud commitment insurance. It sounds fancy, but it's really simple. Archera gives you the cost savings of a one or three year AWS savings plan with a commitment as short as 30 days. If you don't use all the cloud resources you've committed to, they will literally put the money back in your bank account to cover the difference. Other cost management tools may say they offer commitment insurance, but remember to ask will you actually give me my money back? Achero will click the link in the Show Notes to check them out on the AWS Marketplace. [00:30:23] Speaker A: Amazon EKS is introducing node monitoring and auto repair capabilities this new feature enables automatic detection and remediation of node level issues in EKS clusters, improving their availability and reliability of your Kubernetes app There are two components responsible for detecting node failures. First is a node monitoring agent that detects a wide range of issues. It's bundled into the container image that runs as a daemon set in all worker nodes. The agent communicates any issue it finds by updating the Status of the k8 node object in the cluster and by emitting Kubernetes events. It detects things like GPU failures related to hardware issues, driver issues, memory problems or unexpected performance drops, Cubelet health container D issues, networking CNI problems including missing route tables and packet drop issues, disk space and IO error, CPU throttling, memory pressure and overall system load, as well as kernel panics. And then the node repair system is a background component that collects health information and repairs worker nodes when they fail those checks I just mentioned, systems are either replaced or rebooted in response to the conditions within at most 30 minutes. If a GPU failure is active, it will replace or reboot that node within at most 10 minutes, which I think is interesting. They mentioned it. At most repair actions are allowed and can be audited and repair system respects user specific disruption controls such as pod disruption budgets. If zonal shift is activated your EKS cluster, then node auto repair actions are halted and not taken. So I thought this was what EKS managed nodes were, but this sounds like something different. So I was a little confused about this announcement. [00:31:44] Speaker C: I mean, I think you're still right. I think what they're doing is looking at specific things at a much more granular level than a basic EC2 health check would be for instance. And so I think that they're doing basically live in lance probes, which I think is probably why the GPU call out is the way it is. Like they're trying to make sure that it's not, you know, not crazy in terms of thresholds and polling. Not sure though. Or it's just because everyone wants a GPU right now. [00:32:13] Speaker A: I mean it is like the repair capability available for most EKS computer is a feature available for both EKS management groups and Carpenter nodes and EKS auto mode nodes. So like it does exist in those products as well. So this is just maybe if you're rolling your own, you want to use this tape capability. It is interesting though. I'm not fully sure I. [00:32:31] Speaker C: Because I thought it. Yeah, I mean I. It's funny I. I read it like it was very EKS specific. [00:32:36] Speaker A: I mean it is part of eks but you can use ek. You can manage any cluster using EKS even when you self run. So maybe you want to yourself running an EKS cluster managed by or a Kubernetes cluster managed by eks but you want to be able to do node health for some reason and you don't want to use an MNG and carpenter or EKS auto mode. I don't know. [00:32:56] Speaker C: Yeah, I'm not sure. I mean yeah since I haven't used those tools directly it's hard to say but I do like that it's built in to the existing agent in terms of those health checks and hopefully that the thresholds and the tuning of this is tunable where you can set it or it's just completely hands off running and it just works like magic was also which that would also be acceptable. [00:33:23] Speaker A: Magic that works within 30 minutes. So yeah I. I do enjoy that but that because I think that's always been a question when they've done these auto repair things is that people are always asking like well it's a state machine so when. How long is it going to take to take action? And now they're saying well we're hoping real time but it could take up to 30 minutes. So I mean they at least caveating it a little bit better than they have in the past which is nice. But I assume it. I assume in practice you're probably experienced to be better than 30 minutes and better than 10 minutes is my guess. [00:33:49] Speaker C: Yeah that there usually seems to be a cushion and it does seem like it's checking a little bit more like you know one of the things I saw is that it was checking an S3 bucket for how fresh the logs were for instance. You know, that kind of thing. So like it's. That's kind of neat. You know things that we used to do by hand usually in Response to some terrible outage. You know these things here, solving it right off the bat, which is nice. [00:34:11] Speaker A: Yeah. Prompt optimization in Bedrock is now generally available. For those of you who remember us talking about this before prompt engineering, or don't remember as prompt engineering is the process of designing prompts to guide foundational models generate relevant responses. These prompts must be customized for each foundational model according to its best practices and guidelines for best performance, which is a time consuming process that delays application development. So prompt optimization can now automatically rewrite prompts for better performance and more concise responses on Anthropic, Llama, Nova, Nova, Deepseek, Mistral and Titan models. You can compare optimized prompts against original versions without deployment and save them in Amazon Bedrock. Prompt management for prompt lifecycle management and prompt optimizations will take take 3 cents from you for every thousand tokens optimized. But this is one of those things. You, you create the prompts, you optimize them once for each of the models and they don't really change all that often. That's the guidelines change. [00:35:00] Speaker C: Well, I was wondering actually because I was thinking through this like, like if you have, if, if you're building an application that is, you know, largely going to be taking some user input and turning that into an AI response, like I can see how this would be useful for that as like an intermediary to kind of enrich user data of that prompt, you know, so it'd be the difference between, you know, taking some of my terrible prompts that I've put into like, you know. [00:35:30] Speaker A: Yeah, I, I mean, maybe that's that use case. I, yeah, I mean have an automatic pipeline where I'm, you know, my bad prompt is being automatically optimized into different prompt. But I mean, because like the problem is English is hard. So like does the prompt fully understand the intent that I was trying to say? Like, I mean, you assume so. I don't know. Like, I mean, I guess you could do that use case. But I think this is more for like testing foundational models for like grounding or for, you know, you're setting up more like global hints for your application and those you want to be customized to each of the things because again like no one talks to me about like writing a really well prompted prompt to make the best output of my model. Like that isn't a conversation you have with Chat GPT or it doesn't say like, well, you know, you, if you'd asked that question differently, I could have helped you out better. So like that's not. It's got to be something more in the foundational model level where you're like doing something tuning for larger. That's my feeling at least. But you know, again, I haven't used this so I can't actually comment. So yeah, speaking of turn it was. [00:36:25] Speaker C: I mean I remember when we talked about it before I was thinking more of that use case like of like being able to test and part of the building. But I was just thinking about the pricing of that made me think about like maybe they've got a different angle on it maybe and something has to fix my terrible input. So. [00:36:46] Speaker A: Some things can't be fixed. Right? [00:36:47] Speaker C: Yeah. True. [00:36:50] Speaker A: AWS is announcing upgrades to its Amazon Q business integrations from Microsoft 365 Word and Outlook to enhance their utility when performing document and Email center tasks. The upgrade includes Company Knowledge Access, Image File Attachment support and expanded Prompt Context Windows. With Company Knowledge support, users can now ask questions about their company's index data directly through their Word and Outlook integrations, allowing them to instantly find relevant information when drafting their documents and emails without needing to switch your context. Which I mostly want to talk about this because I thought the only thing you could get embedded into Word and Outlook was going to be Microsoft Copilot. And the fact that this is open for Q to do it and for Agent Spaces, which is Google's product, is amazing. So you aren't locked in necessarily directly to Microsoft's AI solutions. If you want to use something like Q for Business or you want to use Agent Space, you have that capability, which is actually really nice. [00:37:39] Speaker C: I'm shocked and a little bit distrusting because like I haven't seen Copilot be able to actually do this office365 so it, I get that it's different. Like it's talking about more from the the Office apps into a data corpus, you know. But that's interesting. [00:37:59] Speaker A: Like, yeah, I mean ideally you're in Outlook, you're writing an email to Ryan and say, hey Ryan, you know, I'm concerned about these vulnerabilities on the server. And then query like Send the top 10 vulnerabilities by server to Ryan, you know, to the TRAP audit. And it goes pull that data and puts it in the email to you and then sends it. That's the use case you're trying to look at here. And I like Agent Spaces does it sort of differently. Like you do it typically in the prompt, it'll actually generate your Outlook email or your Gmail email, and so you can send it directly from Agent spaces through Viya Outlook. But they also are talking about putting plugins in as well. So it's, you know, both ways are gonna be possible. [00:38:31] Speaker C: Yeah, I mean in the day job I'm pocing stuff that is closer to this versus the alternative. So it's kind of neat like to see like, like this is what I've been waiting for is the agentic AI. It's what I thought generative AI was but I was wrong. [00:38:48] Speaker A: Well, that's many things we learned. [00:38:50] Speaker C: Yeah. You know, it's like now it's doing stuff, you know, and so this is really cool and I'm excited to see them like add these things. I'm really surprised that yeah, they can integrate with Q. That is good on Microsoft on that. [00:39:05] Speaker A: All right. Amazon is announcing Serverless reservations, a new discounted pricing option for redshift Serverless lets you save up to 24% and gain greater cost predictability for your analytics workloads as well as pals out there with serverless reservations you can commit to a specific number of redshift processing units or RPUs for a one year term and choose between two payment options. A no upfront option that provides a 20% discount for on demand rates or an all upfront option that provides 24 discount save all the monies. Which is funny for a serverless product that you have to get reservations because this is where Amazon screwed up and overloaded the serverless term to be things that are not serverless. [00:39:42] Speaker C: Yeah, I mean this is just really like, you know, a spiky analytical workload and you're just trying to instead of paying all the money like all at once and trying to get a discounted deal spread across a larger bucket. So yeah, yeah, it's good. [00:39:57] Speaker A: AWS Transfer family introduces a terraform module for deploying managed file transfer MFT server endpoints backed by Amazon S3. This enables you to leverage infrastructure as code to automate and streamline centralized provisioning of managed file transfer servers for SFTP as two FTPs, FTP and web browser based interfaces directly into and out of AWS storage services. If you're using FTP, you should stop immediately. My caveat, every time we talk about this, I don't know why they supported it. They are continuing a terrible practice of the world. [00:40:27] Speaker C: Oh, I know why they support it. [00:40:29] Speaker A: I know why they do. [00:40:29] Speaker C: But every job we've ever had, we need it. That's why just so annoying and we want to kill it and we can't. Is it? This is. But this is one of those features I'm like, this is new. [00:40:43] Speaker A: Like, I mean, I was shocked that terraform was. There wasn't a terraform module for this already. So, yeah, I was a little shocked about that. I think, I think you might have been able to write just normal Terraform through it, at least through the AWS ccc, because it was in the API for a while. But this is a pre built module that kind of helps you make it easy. So that's, that's the difference. But still like, yeah, okay, this one makes more sense. This one probably should have come a. [00:41:05] Speaker C: While ago, so you don't have to compile the entire environment. [00:41:08] Speaker A: Correct. [00:41:08] Speaker C: Yeah. [00:41:08] Speaker A: Although with AI, who cares? Now error modules are worthless that just make AI. [00:41:15] Speaker C: Yeah, I mean, that is cool. It is one of those things where if you have sftp, one of the ways that you mitigate risk is to segregate it out by whoever your clients are on the other end of it. So it is kind of neat, but it does turn into a scaling challenge really quickly when you manage it that way. So the more automation and the easier they make that until we get off of mainframes and FTP and the other dinosaur products that are in our everyday ecosystems. [00:41:46] Speaker A: Yeah. I remember supporting AS2 back in the day. You think SFTP is bad as 2 was. I mean, back in the original AS2 implementation, now it's all over IP, which is nice, but back then you have a private connectivity thing through a thing they called a webvan, which was like this proprietary delivery mechanism that AS2 ran on top of. And it was, it was, it was special. [00:42:09] Speaker C: Yikes, that just sounds bad. [00:42:11] Speaker A: Oh, it was, it was like I had, I remember having to implement it. I was like, why, why did. Did you not hear of sftp? Like, oh, this is better. It was a banking standard. [00:42:20] Speaker C: Of course it was a banking standard. [00:42:25] Speaker A: All right. I don't know how to feel about this one, Ryan. Amazon is releasing a new visual user interface for creating and editing Amazon OpenSearch ingestion pipelines on the AWS console. This new capability gives you a guided visual workflow, automatic permission creations and enhanced real time validations to streamline the pipeline development process. The new workflow simplifies pipeline development, reducing setup time and minimizing errors, making it easier to ingest, transform and route Data to Amazon OpenSearch service. Everyone knows I hate Elasticsearch with a passion. And I don't hate OpenSearch because at least they're trying to be a good community. [00:42:59] Speaker C: But if you want, you would hate this if you used it if you. [00:43:03] Speaker A: Want to use OpenSearch for logging, which is why I assume you want this ingestion pipeline. Can't. I can't support you in this because having tried to manage a very large 20 terabyte per day Elasticsearch cluster with log data, I can tell you that it's a really bad idea and you really shouldn't do it. And you know, part of all of Ryan's gray hairs and his goatee and the reason why I have no color in my goatee is because of elasticsearch. So. [00:43:28] Speaker C: And you know what we didn't have in that giant elastic search? Any kind of pipelines because they were too expensive and they broke everything immediately. [00:43:37] Speaker A: Yes. I mean, this is, this is many years ago now. So, yes, they did not exist at the time and then when they did exist, they were too expensive. [00:43:46] Speaker C: Well, so ingestion pipelines have existed, right. This is just more of the configuration of them. And so like, this is, you know, like this is the thing you put in place because you're not, your data isn't in the right format and so you're doing transformation in line. That's what, what these things are for. And so, like, if you put all that, if you centralize all that into your open search clusters, you're going to have a bad time. That's just as simple as that. And so like the, if you're going to make it really easy for a bunch of transformation steps to happen by creating a visual interface, which, you know, maybe I'm being a little judgmental, but the people that are going to use the visual interface for ETL is, you know, they're likely not going to have the full hatred of ETL in line. So this is, I don't know, I don't like this. I do like things being easier for people, but sweet Jesus, this is going to blow up on people. [00:44:45] Speaker A: Don't come and tell us we didn't warn you. Yeah. All right. Amazon is announcing the second generation of AWS Outpost Racks, which marks the latest innovation from AWS for Edge computing. I actually didn't know anyone use these products, but they actually mentioned customers in this post, so that's great. So now I need to go find people who work at those companies to ask them about their outpost experience because I have so many questions. But apparently the new generation includes support for the latest x86 powered EC2 instances, certified networking scaling and configurations, and accelerated networking instances designed specifically for ultra low latency and high throughput workloads. Enhancements deliver greater performance from a broad range of on premise workloads. These enhanced deliver greater performance for a broad range of on premise workloads such as core trading systems or financial services and telecom 5G core networks. A second generation output rack can provide low latency local data processing for data residency needs such as game servers for multiplayer online games, customer transaction data, medical records, industrial manufacturing system, telecom and edge inference for any variety of ML models. You can get the seventh generation of x86 processors on App Rex, the C7i, the N7i and the R7i optimized images and they note that support for the More later generation EC2 instances are coming very soon. But you know what this announcement doesn't say a thousand times Ryan I have. [00:46:03] Speaker C: It's been like a breath of fresh air during this one announcement. It's kind of nice. [00:46:07] Speaker A: Yeah, yeah, no AI. Not a single mention of it. They did mention inference for variety of ML models and they do specifically call out CPU based ML models. That is because none of These instances support GPUs yet, but they do promise that they are coming soon. Both latest generation EC2 and GPU enabled instances. But yes, it was nice not to have to talk about AI for the entire conversation. [00:46:32] Speaker C: I wonder how many requests they get for people trying to do an end run about gpu. [00:46:36] Speaker A: Oh I'm sure about like do you. [00:46:38] Speaker C: Have any outposts that have gpu? [00:46:40] Speaker A: Yeah, I mean well that's what GCP just put in there Edge devices at next you know you can get, you can get GPUs in there so like you know clearly there's a demand for that. I mean like and again the customers they mentioned here like some of the one was Athena Health FanDuel Riot Games just to name a few people who have adopted outposts on premise so I'm sure they definitely have needs for AI. [00:47:02] Speaker C: Yeah large data workloads make sense or. [00:47:05] Speaker A: Like low latency Athena Health and using it for image, you know for you know scanning X rays for known bad things you know to help radiographers. Like there's lots of AI use cases for healthcare that make a lot of sense that GPUs will be needed. So yeah having to see Atpos 2 I did not know that anybody was using it. So happy to see that it's just overall net add. Great. Glad to see you get the latest and greatest. I was also surprised not a lot of Graviton instances. I figured that would be for sure in the outposts as well. But. [00:47:36] Speaker C: I wonder if that's more of the support model because I still Think that supporting these things in the data centers has to be like incredibly difficult. Like from like a hardware RMA standpoint. I wonder if the Gravitons require more of a specialized sort of environment to run in. [00:47:53] Speaker A: Maybe. Yeah, just double checking. There's not even the Gen one do they have graviton. So okay, you know, again, trying to support more newer modern models in the future. So we'll see, we'll find out when they announce them, we'll get with them. All right. Ryan and I work in SaaS companies. Whenever there's a SaaS feature, I'm always kind of excited about it. But then I read this one and I'm like, oh yeah, this is not the one I wanted. I know why they needed it and I mixed forwarded this exact use case, but I don't like it. Let me tell you about it Ryan and see what you think. Okay, Amazon is announcing the general availability of Amazon Cloudfront SaaS Manager, a new feature that helps SaaS providers, web development platform providers and companies with multiple brands and websites efficiently manage their deliver across multiple domains. So far, so good. Okay, you know. Cloudfront SaaS manager addresses critical challenges organizations face, including managing tenant websites at scale, each requiring TLS certificates. Oh, we're going off the rails. Distributed denial service protection and performance monitoring with Cloudfront SaaS Manager. Web development platform providers and enterprise SaaS providers who manage a large number of domains will use simple APIs and reusable configurations that use Cloudfront Edge locations worldwide AWS, WAF and AWS Certificate Manager. Multitenant SaaS deployments is a strategy where a single Cloudfront distribution serves content for multiple distinct tenants, either users or organizations. And the Cloudfront SaaS manager uses a new template based distribution model called a multi tenant distribution to serve content across multiple domains while sharing configuration and infrastructure. However, is supporting a single website or application. A standard distribution would be better and recommended by aws. A template distribution defines the base configuration that will be used across domains such as origin configurations, cache behaviors and security settings. And each template distribution has a distribution tenant to represent domain specific origin paths or origins domain names including web access control list overrides and custom TLS certificates. Because your SaaS application, you decided that you were going to let them use their entire domain name in the name of their application. [00:49:50] Speaker C: Yeah, triggered my PSD immediately as well. [00:49:55] Speaker A: Yeah. So like what you typically see is companies, if they're going to do vanity URLs or any kind, it's typically like vanity URL.saascompanyname.com and that allows you to do wildcarding and all kinds of things that make sense for this. But this model is where you said, Look, Joe Blow Company, you can have sas.joblocompany.com and you can do that and you do that for all of your clients. So now you have a very complicated set of cloudfront configurations because every one of them has to have its own cloudfront configuration. Configuration because you did a custom URL, vanity URLs. Now you can use this to help you make that less toil, which is appreciated. But it's also a terrible model and I don't recommend it for a SaaS application if you can help it. [00:50:37] Speaker C: Yeah, no, I mean, it's, I think back to, you know, like, I remember your advice. It's like, couldn't we partner with someone who does this? I guess in the early days of a prior job? And I was, you know, like, I didn't know how bad it was going to be when you said that. And now I do do. And like, this is all I could think of when, when reading through this. [00:50:56] Speaker A: Like, this is, this is hard to. [00:50:58] Speaker C: Do, like, and I get everything about this from both the customer side, from the SaaS provider. Like, there's, there's companies that don't want to maintain their own, you know, web infrastructure, their own website. They want to just, you know, sort of WYSIWYG it up and rebrand a little bit, but have a centralized config that's run by someone else. But then, you know, there, the little gotchas on the edge of all these things are where it gets tricky. You know, how do you, how do you centrally manage, you know, like WAF rules for something like that, you know. [00:51:30] Speaker A: How do you know, certificates? Like, you know, it made a lot of sense maybe at one point, it doesn't make a lot of sense now. And yeah, as you get into like, all these managed services that require domain level configurations, it becomes very problematic very quickly. I'm wondering if this is going to actually come to the load balancer too, because that's the other place that would need this. Or they just say, like, look, if you really want to do that model, which we don't recommend, they flat out call it like, we don't recommend this if you have a single website or application. I wonder if they're just going to be like, no, if you want to do that through a load balancer, you just have to front it with cloudfront, I imagine. [00:52:02] Speaker C: Well, I was about to say yes and now I'm not so sure because I can see how they could. Like just the way that you could attach things directly to load balancer, like security policies for WAF and stuff. I can see maybe, but I, I don't think it should. [00:52:18] Speaker A: No, I mean, most of the WAF rules that I've written, you typically don't specify the. [00:52:23] Speaker C: The domain. [00:52:24] Speaker A: The domain. You only specify the URI pathing. [00:52:27] Speaker C: So yeah, as long as that's super consistent across, you'd be all right. [00:52:31] Speaker A: Yeah. Yeah, I agreed. All right. And then our final Amazon story. This week, AWS announces support for VPC endpoints and Route 53 profiles, allowing you to create, manage and share private hosted zones for interface VPC endpoints across multiple VPCs and AWS accounts within your organization. This enhancement for Amazon Route 53 profile simplifies management VPC endpoints by streamlining the process of creating and associating interface VPC endpoint managed private zones, PHCs with VPCs and APIs accounts without requiring manual association or the stupid Route 53 proxy they created. [00:53:02] Speaker C: No, no, no. You have to have a host host file that's private attached to every distinct VPC in your Org, but it has to be the same everywhere or it doesn't work because it's DNS. [00:53:12] Speaker A: Yeah, so I was, I saw this and I just chuckled. I was like, oh, yeah, finally fixed that problem. Did you? [00:53:16] Speaker C: Yeah. [00:53:16] Speaker A: Thank you. [00:53:17] Speaker C: Yeah. [00:53:18] Speaker A: Ten years too late. Yeah. [00:53:19] Speaker C: This is way better than like the orchestration I think I've built twice to go update like six different zone files in six different places. [00:53:26] Speaker A: Yep. Well, they like, they sort of fix it because like, they created the Route 53 internal, you know, proxy thing where you can proxy to different accounts so you could have, you know, like they fixed part of it. They didn't fix it completely. And so this is kind of the last piece needed to really actually make that work. [00:53:40] Speaker C: I think it's, you know, it's just meeting everyone to where they are too in those implementations because, like, over time it changed, the offerings changed and depending on where you started, your, your internal management of these things is going to be completely different. So, yeah, this is nice. This is. Does seem like it wraps it up, makes it really easy to centralize onto a core ops team. It still has to manage all the. For sure. [00:54:06] Speaker A: All right, let's move on to GCP, who also decided this week to announce a SaaS feature that I didn't like. So I just read this blog post and I. Then they had a video to the link where they announced it at Google Next and I missed this session and didn't even see it in when I was scanning through all the recordings. So I was. So I started watching the video. Then I just got madder when I was watching the video. So yeah, this is going to go well. Ryan. So they announced the preview of SaaS runtime at Google Next is a fully managed Google Cloud service management platform designed to simplify and automate the complexities of infrastructure operations, enabling SaaS providers to focus on their core business. Okay, you started out strong. Based on their internal platform for serving millions of users across multiple tenants, SAS Runtime leverages their extensive experience managing services at Google scale. SAS Runtime helps You model your SaaS environment, accelerate deployments, and streamline operation with a rich set of tools managed at scale. With automation at its core, SAS Runtime vision includes these three components. Launch, Quickly, Customize and Iterate. SAS Runtime empowers you with pre built customizable blueprints, allowing for rapid duration and deployment. And you can easily integrate AI architecture blueprints into existing systems through simple data model abstractions. Don't hate this part. This part's okay. Automate operations, observe and scale the tenants as a fully managed service, SAS Runtime allows automation at scale, starting from the current continuous integration, continuous delivery pipeline onboard to SaaS runtime, and then scale it to simplify service management, tenant observability and operations across both cloud and edge environments. And I said, that's a little weird. What exactly are you trying to do here? And that's when I see the diagram and I realize this is for a SaaS company that literally deploys an instance for each customer. This is also a bad pattern. It's expensive pattern number one. But sometimes customers like this because it makes it very easy to say, well, these are your direct costs and so you should pay for them. This is a model that JIRA uses. This is the model that ServiceNow uses where you're getting a dedicated app server in addition to a dedicated database server. And so yeah, this is to manage all of that at scale, which I appreciate, but also it's not really how you should do it either. And the final thing this thing offers is the ability to integrate, optimize and expand rapidly as the SaaS runtime is integrated into Google Cloud, with developers being able to design applications with a new application design center, hand them off to Google Cloud Marketplace, and once deployed across tenants, monitor their performance with cloud observability and App Hub. [00:56:31] Speaker C: So that's what sealed the deal for me, is the integration of Google Cloud Marketplace. Because if You've ever put an app on Google Cloud Marketplace, you realize how the system is set up for this sort of tenant cookie cutter deployment model. And if you've worked in cloud, you're like, I don't know how we'd ever pay for such a thing because it's just like, yeah, if you're deploying, you know, a web serving tier, a data tier and you know, a load balancing tier and then, you know, whatever security WAF protections you need on top of that, like that scales uncontrollably and then now they're also, you know, adding into the ability for you to score, to scale that up, to load in here like this seems really dangerous. [00:57:19] Speaker A: So there's a model that makes sense if you want to be a really good marketplace sponsor. And that is that if you can sell something in Marketplace that you can deploy into the customer's environment to drive their spend, that is a really good model for your partnership. It's not a great operational model. It has some complexities on, like how do you troubleshoot the thing that's in the other person's account? Lots of things. If this product were to manage that, I wouldn't hate it as much because I think that is a good model for certain things, especially with ML and AI. Like, hey, I have this tool that runs AI workloads and I'm going to give you a control plane, but then you're going to run the AI workloads with your own GPUs and your own contract or your own discount in your account. We're just going to manage that for you with this. I don't hate that model. I just don't see in this announcement how this helps me with that problem, which is actually the bigger problem. Because if I'm going to sell that through Marketplace and that's going to provision stuff in their account, then how do I connect it back to my mothership to then manage and take care of it and deploy new things to it. And if you could automate that toil, okay, I'm down. But what this feels like is another bad pattern that you should not. If you're, I mean, again, there are tremendously successful SaaS companies that do this and if you have a very simple architecture, then this might make sense for your SaaS application. But if you have a very large, complex microservices based SaaS application, the economies of scale are not going to be there for you in this. This is not the way to go. [00:58:41] Speaker C: Yeah, I mean, and that's what I kept thinking on this is like, yeah, like this almost calls back to like the old days of like, you know, PHP serve host. [00:58:50] Speaker A: Yeah. Again like very simple. Jira. You run JIRA on a server with a database. You can scale that for a lot of companies at Atlasian, you know, but their new stuff is not built that way so you wouldn't use it for the new stuff there. Like I'm not trying to pick on them because I don't work there or have any, you know, ever been in business with them. But like I know their model. I know that's how ServiceNow does it. I know like it does make sense and it is a workable model for a lot of things. But it's an expensive model. And if you're building a brand new SaaS application today, which is what this is targeted at, I don't know if this is the model you should go with. [00:59:22] Speaker C: Right. And it would be impossible to put a good pattern in place using this. Right. Like, because it would be one of those things where, you know, you, you combine and you get economies of scale. You, your data sets are multi tenant, they're not single tenant. The scaling design is at the logic level and not necessarily through all, you know, through every element of the tenant. And so like, yeah, that's the thing I don't like about this the most is that it, I feel like it encourages bad design. [00:59:52] Speaker A: Yeah, I mean I like there are definitely a challenge with multi tenant too. Like, I mean I, I am not a, I am not opposed to the single customer database per tenant for isolation reasons. You know, it, it solves a lot of problems with a lot of customers who don't want their data commingled. Like there's a lot of reasons why that's not a bad, I mean it's expensive, it's more expensive but like it's not a bad model. Multitenant is hard because you get into now noisy neighbor problems and how do you manage the tenant, how do you extract them out of this database? Put them into a different database. If you're doing multi database to support your sharding or to support scale out. There's a lot of challenges with multi tenant too, but this doesn't help with that problem. This hopefully gets a lot more stuff that then makes it available for other options like managing in their account. That's cool. Managing actual other SaaS patterns other than this pattern, which is not my favorite one for lots of reasons because it's expensive, you know, fine. Okay. But I'm hoping if this is something that Google wants to get serious about which is providing a real troll control plane for SAS runtimes. If they do more with this, I'm sort of intrigued where they're going to go with it. But I'd love to talk to a product manager on this. Like, what are you thinking? Why this one first? Like there must, there must be a customer who wanted this first. That's the only reason why I think you built this one at first. And they do mention AI workloads and maybe that's the part that I'm sort of missing a little bit in this is the AI concept of it. Like, well, in an AI application because again, it's, you know, you're, you have so many GPUs and you have to do more billing direct back. The isolation is really key in a SaaS AI application that you would want that does not have data coming. I again, lots of questions like I'm sure the use case makes more sense if I knew what they were trying to accomplish. I just can't pick it up out of the article. [01:01:30] Speaker C: And I also, like I read this when it was announced at Google Next and I didn't understand what they were talking about then. I understood it a little bit more reading it through it now, but only because the AWS announcement had it fresh in my head. Otherwise I don't think I would understood it good. So it's not really well written in terms of like, what does this do in terms of value. Right. Like you're a SaaS app, you should use this. Like that's all I got out of it, you know, like. [01:01:54] Speaker A: Yeah, well even like, you know, like, oh, you know, this is the IMS blueprint. It's a GK cluster with cloud load balancing, cloud armor and those are the users and they're going to depend on this base blueprint. And I'm like, okay, so I get that. But that has a GKA cluster and that has cloud spanner and that's all expensive stuff. [01:02:10] Speaker C: Yeah. And that's single tenant now really? Because that's the whole reason we moved away from instances and VMs and managing these things in kubernetes is so that we could get the economy of scale. Now you're doing a cluster per tenant. That makes no sense. [01:02:23] Speaker A: Yeah, yeah. And then also is this also supporting the idea that different customers can be on different code lines? Because that's also not good. [01:02:33] Speaker C: Again, I mean that's what it seemed though, talking about the CI CD integration. [01:02:37] Speaker A: Yeah, I mean that's what it sounds like. Oh yeah, you're Going to do a customized app per Saskyl. No, don't do that. Please don't do that. I can show you what that looks like at scale. It's not fun. [01:02:48] Speaker C: Here's this scar. Here's this scar. Yeah. [01:02:51] Speaker A: So, yeah, like I said, I'd love to talk to a product manager. I might actually reach out to our Google rep, be like, hey, I want to talk to whoever came up with the service because I, I have to understand more because like this is, this is my bread and butter of what I do in technology. And so like I have questions, right? Like what were you thinking? What, what was your, what was the problem statement? I need to know because like if it's, if it's literally the problem statement, this reads as to me like I hate you and like I don't ever want to talk to you about this again. But if like you have something I haven't thought about for an AI or something, I'm sort of intrigued about how that's going to scale long term because this is, this model for web apps was bad. Yeah, I can't imagine it's gonna be great for AI either. [01:03:32] Speaker C: Yeah, yeah. I don't know enough. But what little I know, like that just seems all. Just quagmires and pitfalls. [01:03:42] Speaker A: Yeah. [01:03:42] Speaker C: Yikes. [01:03:44] Speaker A: All right, now my blood pressure's up. Google Cloud database and LangChain integrations now support Go Java and JavaScript. Each package supports vector stores for semantic search for databases. Chat message history to enable chains to recall previous conversations. Again, Westworld and Document Loader for loading documents from your enterprise data. If you need this, have it in Go Java and JavaScript. If you're doing a JavaScript, I'm sorry for you. If you're doing a job and go. Okay. [01:04:11] Speaker C: Yeah. [01:04:11] Speaker A: All right, let's move on to Azure because I don't think I'll save the pipe. [01:04:14] Speaker C: Yeah. And I think that one slipped past the pre reading. [01:04:18] Speaker A: It was cool. I got Westworld reference in. [01:04:20] Speaker C: It's true. [01:04:21] Speaker A: Good. All right. Azure unveiling GPT Image one rising to new heights of image generation. Azure AI Factory. Okay. Or at Microsoft. [01:04:30] Speaker C: Well, yeah, we get it. You're big on excited about. [01:04:35] Speaker A: And then they use thrill to announce. I'm like, oh God. The launch of GPT Image 1, the latest and most advanced image generation model. Sure it is. Our API is available now to all gated customers. Limited access model application and playground is coming early next week. This groundbreaking model sets a new standard in generating high quality images, solving complex prompts and offering zero shot capabilities. In various scenarios. So what I can do is tell you is this is just what we talked about earlier with OpenAI but now at Google and Microsoft, because they're a partner. [01:04:58] Speaker C: Oh, totally. [01:04:59] Speaker A: Yeah. This is the API we talked about earlier, just now here. So we don't have to rehash the whole thing. I just. They're thrilled though. So you know, you're. When you're thrilled, you got to talk about it, right? [01:05:09] Speaker C: Yeah. They got to justify all the money they spent. [01:05:13] Speaker A: Kind of, I'm kind of hoping for like the messiest divorce ever between OpenAI and Microsoft because it'd be just endlessly entertaining for us. [01:05:19] Speaker C: And it kind of feels like it's coming. [01:05:20] Speaker A: It feels like it's coming, so. Feels like it kind of waiting for the day, like, oh, this is going to be messy. Because really Microsoft controls a lot of cards. Like they can prevent them from becoming a for profit company if they want to. So like you, like, this is a very delicate. [01:05:37] Speaker C: If it gets petty. [01:05:38] Speaker A: Yeah, if it gets petty. I don't really know if Sacha is a petty guy though. That's the thing is like, he seems like a really nice guy on the Surface and his public Persona is very kind. I wonder if internally he's not that way. [01:05:51] Speaker C: I don't know. [01:05:52] Speaker A: Like I met TK in person, he's kind of what you. He kind of what you see on stage, kind of what you get. He's very knowledgeable, super smart, remembers everything like. But I don't really get the sense that he. I mean, I could see him getting angry. Like, I definitely see that. But like he seems relatively like when I met him in person, he was very much what I said. And again, it's public face, so maybe that's why. But I didn't get the impression that it was a public face. When I was talking to him, it seemed very genuine and very like he cared. It was very nice. Are you tired of restarting Windows servers for patching? [01:06:19] Speaker C: Oh, you know it. [01:06:20] Speaker A: What if I told you I could make it so you don't have to restart your Windows servers anymore because you can now hot patch them? [01:06:26] Speaker C: Shut the front door. [01:06:28] Speaker A: But what if I told you to do this? I had to connect your Windows servers on GCP to Azure through Azure ARC to make this happen? How would you feel then? [01:06:37] Speaker C: I would feel like, yeah, okay, this is starting to make sense. This is terrible. Like I was expecting. [01:06:40] Speaker A: Yeah, okay, then we're on the same page. Perfect. Let's get through this. Hot patching for Windows Server 2025. Make it available in preview in 2024, will become generally available as a subscription service on July 1, 2025 because you're not already paying for the Microsoft licensing. One of the key updates in the latest release of The Windows Server 25 is the addition of hybrid and multi cloud capabilities aligned with Azure's adaptive cloud approach. Hot Patching is where you're taking what was previously an Azure only capability and now making it available to Windows Server machines outside of Azure through a work Hot Patching is a new way to install Windows Server 25 updates that does not require a reboot after installation by patching the in memory code of running processes without the need to restart the process, which they always said was black magic, but apparently they finally figured it out. Some of the benefits of hot patching would be higher availability with fewer reboots, which we all would love. Faster deployment updates as the packages are smaller, install faster and have easier patch orchestration with Azure Update Manager and the hot patch package install without the need to schedule reboot so they can happen sooner. This can decrease the window of vulnerability which can result if an administration administrator might normally delay an update and restart after Windows Security Update is released. Hot Hatching is available today at no charge because it's in preview, but starting in July with a subscription launch, Hot Patching for Windows Server 25 will be offered at description of $1.50 per CPU core per month. Of course, to make this work, the service must be connected to Azure ARC for management. [01:07:57] Speaker C: I hope that there's a technical reason because it's, it feels like a cash grab. [01:08:02] Speaker A: Oh, it totally feels like a cash grab. [01:08:04] Speaker C: Yeah. Like on one hand, like I get it. Like they've, they're solving operational problems they have by managing managing their workloads on Azure and they're, they're, you know, this is an enhancement that comes directly out of managing servers at that scale, which is fantastic. The fact that they put it as subscription on ARC makes me feel a little like dirty about it. Like the that they're just going to be like oh no, you know, like you could, you could have a bad day or you give us $150 per CPU per month. Like it feels a little extorted extortion and I don't like that. But I'm hoping that there's like technical reasons like the, the patch manager and maybe there's an investment in the management ecosystem and stuff that if I knew details would make it feel better. But. [01:08:51] Speaker A: So I mean I if you're connecting a server to Azure arc, I think you have to pay a fee already. I'm just looking here what that costs. You get Azure ARC Control Plan, you get inventory for free. You get Management, which allows you to administrate your servers. And we're using SSH Run command and custom script extension for free. And you get VM Self Service which allows you to perform Lifecycle Manage, which is Create, resize, update and delete and Power Cycle operators to start, stop and restart on VMware, VCenter and System Center Virtual machine for free. And then if you pay for it, you get access to Defender for $5 per server per month or Microsoft Defender for servers Plan 2 for $15 a server month, which I think is the one that includes the Yuba and more fancy edr, emr, EDR Endpoint, er, whatever, whatever Carsec does. Basically that's their version of that. Azure Update Manager for $5 per server per month. And then Azure Policy, Guest Configuration Change tracking inventory for $6 a month. And then you get Ingestion charging for services that are billed per gigabyte, including Azure Monitor Analytics, Azure Monitor, SCA Managed Instances and Microsoft Sentinel, which goes from 276 to $6 or $5.22 associated. So, so to do this, you can get Inventory Manage and VM Self Service for free and then you can pay just $1.50 per CPU core. So that's. I mean, like, is it worth $1.50 to me per core to have hot patching? I mean, kind of from a security vulnerability perspective, it's extortion, but it's definitely something that sort of, you know, makes some sense, but it just sounds. Yeah, it definitely feels like a cash grab. I did ask Claude if it knew of a technical reason why hot patching requires Azure arc. It gave me nothing that felt like a reliable answer. Yeah, so I don't actually know. [01:10:41] Speaker C: I don't think it would be publicly available information if it was. But yeah, Azure ARC in general looks like it's a premium for security tooling, which is never good. So like patch management falls right in line with that. So like, yeah, okay, pay a premium then. [01:10:58] Speaker A: I mean, it says, I mean, I guess virtualization based security is one of the requirements which enable hot patching. The machine needs to satisfy the crimes for Virtual Based Security, or vbs, also known as Virtual Secure Mode vsm ARC provides a framework to verify and manage these security prerequisites. And then of course you get subscription and license management to make sure that you're licensed. I get that. I wonder if it's if there's a concern that if they enabled hot patching without this Azure security tier that someone could inject a virus or something bad through the hot patching process. So they've locked it down in a way to make it secure. That seems plausible to me. Maybe that's all I got. I'm trying really hard to see how this works and why I would maybe need it. But again, I don't know for sure. But if you want to pay A$50 a month per server to get hot patching per core actually. Which is dumb. Which is dumb. [01:11:51] Speaker C: Yeah. [01:11:51] Speaker A: Because there's one Windows operating system that you're patching, it's not patching per core. [01:11:55] Speaker C: Yeah. [01:11:55] Speaker A: And then all the other pricing is per month. It doesn't mention per month per core. Let me go back to that. [01:12:00] Speaker C: Actually. No, I bet you it is the fine print because it's Microsoft. That's what they do. [01:12:05] Speaker A: Monthly rate, it says $5 per server per month. So it's definitely not okay. Doesn't say if you want Windows Server pay as you go enabled by Azure arc, then you do pay per core. But you wouldn't do that in this because we own licenses for customers of that Active SQL Server license there they can charge you for that through ARC Extended security updates enabled by arc. Oh, you get extended security updates always win. So you can get that. So you can pay for your Windows 2000 or 2012 server to be still get security updates through Azure Arc. But that's a cash grab 100%. I don't see any reason why this is a way to make our services better revenue better. I pay it though. Hot patching is a thing. [01:12:46] Speaker C: I mean patching in Windows is so hard that. [01:12:49] Speaker A: So hard. [01:12:49] Speaker C: Yeah. [01:12:50] Speaker A: I mean it's one of the reasons why I like Linux. I could just. Most things I can patch without having to reboot. Some things I do, but most things they don't. And I assume the same thing with hot patching. Not everything's gonna be eligible for hot patch, I assume. [01:13:01] Speaker C: Yeah. I imagine there's always gonna be exceptions at lower level. [01:13:03] Speaker A: But if you could reduce it to once a quarter, I have to reboot it for patching versus every month. That would be bad. [01:13:09] Speaker C: Well, and we only do it once a month now. I mean, because there's patch Tuesday reasons. But I mean patch Tuesday is only because like it's too painful to do it any sooner than that. Right. Like it's ridiculous. You should be patching all the time. [01:13:24] Speaker A: I'm gonna put deep research on this problem. Why does it require. Let it go. Go search for a lot of reasons. Yeah. [01:13:31] Speaker C: You're so good at remembering to do things like this. I can never remember. [01:13:34] Speaker A: It's awesome. I will, we'll report back maybe next week as a follow up if it comes back with something more interesting. But so far I deep research take about 30 minutes and we don't have time to wait for that. We have a show to get back to. All right, back to the show. Azure is announcing the preview of their next generation of confidential VMs. Powered by the 5th gen Intel Xeon processor, the Emerald RA. Its chip with Intel Trusted Domain Extensions or TDX. This enables organizations to bring confidential workloads to the cloud without cloud code changes to the applications and the supported SKUs. Because Azure loves their SKUs includes the general purpose DCEsv6 series and the memory optimized ECEsv6 series rolls off the tongue. Confidential VMs are of course designed for tenants with high security and confidential requirements. Providing a strong testable hardware enforced boundary. They ensure that your data and applications stay private and encrypted even while in use, keeping your sensitive code and other data encrypted in memory during processing. I did not know this was a Kabuli that does not require application changes enabled by tdx. Now that's nice, but do I trust Intel? [01:14:31] Speaker C: Well, I mean there's application changes and there's application changes too. Right. So yeah, the application itself, code wise, doesn't mean anything but almost no one I know can manage their application in production. In running a confidential map. [01:14:46] Speaker A: There's that problem. Yes. [01:14:48] Speaker C: So you want to debug a customer issue. Well, good luck. You know, like. [01:14:52] Speaker A: Yeah, you're not getting debugger information on that. That's. Yeah, that's. It's interesting. Again, Intel's had so many security issues. I also don't trust TDX because most other common computing is based on. Based on AMD technology, is it not? Most of the other ones? [01:15:08] Speaker C: I think so, but I never really thought about that. [01:15:10] Speaker A: Security expert looks at the ceiling. He doesn't know just to the cloud. [01:15:15] Speaker C: Guy turned security expert who's thinking it at the best of times. Yeah, no, it's. I, I think everyone I know of is. Is AMD based but yeah, I think. [01:15:23] Speaker A: All the ones I know of are amd. This is yeah, one of the first ones I've seen Intel. I mean I don't. Maybe it's always been intel on. [01:15:29] Speaker C: On Azure I knew that intel had the ability and so I just sort of glossed over it. [01:15:33] Speaker A: But yeah, But I do know for the AMD it does require application changes to enable the change. I don't think it's seamless. Yeah. [01:15:43] Speaker C: So I think that, I think that's a big differentiator with. With what Intel's offering is that it shouldn't. But I. Yeah. [01:15:50] Speaker A: So Google Clouds is based on the AMD Epic processor. Yeah, theirs is also AMD anyway. Well actually they actually, they can actually do some of it in nitro too, but not all of it. [01:16:01] Speaker C: But I think all. I thought all the nitro was all AMD as well because it's, you know, it's little computers managing other little computers. [01:16:09] Speaker A: I mean the nitros are nitros are all graviton or some variant. [01:16:12] Speaker C: Graviton. Sorry. Yeah, like it's not in. Yeah, I guess. [01:16:18] Speaker A: Yeah, well and they also offer it on. So Amazon does have intel based processors with the total memory encryption or tme and they have AMD Milan processor with the security memory. This is an old article 2021. I was like, wait, Milan's real? Yeah. Anyways, Epic is definitely the Google solution. I knew that one for sure. And finally Azure is announcing the preview of larger container sizes for Azure container instances. And my first response to this was of course they did because they're Windows containers and they need ridiculous amounts of memory. And then I actually read the article and I was sadly schooled. [01:16:56] Speaker C: Yeah. [01:16:57] Speaker A: Customers can now deploy workloads with higher VCPU and memory for standard containers, confidential containers and containers of virtual networks and containers utilizing virtual nodes to connect to AKs. ACI now supports VCPU counts greater than 4 and memory capacity is greater than 16. Now first of all, 4 and 16 is small for some containers, especially in the GPU enabled LLM world. So with the new maximum being 32 VCPU and 250Gig for standard containers and 32 VCPU and 192Gigabyte for confidential containers. That is a pretty decent increase. 4 divided by 32 it's 6x more capacity. My math, 7x 7x more CPUs and then a pretty significant increase in memory. So I wanted to mock them mercilessly for this but this seems reasonable increase that you're delay. They're just late on. [01:17:45] Speaker C: So I'm just surprised they got away with it for as long as they did because I went on the same journey you did, which is like point and laugh and then like oh, they only allowed four because like I've never seen a workload need more than four CPUs but everyone asked for more. [01:17:58] Speaker A: Oh yeah, no One needs it, everyone demands it because they don't understand how, how computing works. But especially. But, but honestly with the Windows host I do want more CPUs than that. I definitely want at least 16 for a high performing web application. But yeah, anyways, so yeah, no, that's pretty crazy. [01:18:17] Speaker C: So yeah, no, that's kind of neat and so that, you know, for those I'm sure there's a lot more containerized workloads that will go on to care if you're already in aks. [01:18:25] Speaker A: Well, I think, I think you could still run AKS nodes that would support larger containers for. Because this is again the Azure container instances which are their managed. [01:18:32] Speaker C: The managed. [01:18:33] Speaker A: Okay. So I mean I think you could have got bigger through AKS just natively but this was the Manage offering. But then again the Manage offering being limited to 4 and 16 is just ridiculously small. I imagine that prevented adoption of ACI pretty heavily. [01:18:44] Speaker C: Yeah, I mean I, I think I remember Amazon's which I'm forgetting the name of now but they're managed going through the same sort of ecosystem or same sort of flow where it started off pretty small and slow which to control scale and then they, they opened it up over time as we got the capacity. [01:19:04] Speaker A: All right, Fargate, that was, that was. [01:19:08] Speaker C: A long time for that to drudge up. [01:19:10] Speaker A: Yeah, it's been a long week. [01:19:12] Speaker C: It has. [01:19:14] Speaker A: All right. Introducing Digital Ocean's Managed Caching for Valky. They've launched a Managed Caching for Valky offering which is their managed database service that seamlessly replaces Managed Caching which was previously managed Redis. The offer is compatible with Valky 8 and Redis 7.2.24 and is meant to be a drop in replacement for their managed caching database service because they also did not want to pay Redis stupid money. So appreciate that from DigitalOcean. I'm glad they jumped on the Valky bandwagon. [01:19:40] Speaker C: Yeah, no, that's cool. I like to hear DigitalOcean coming up with these managed services and so if you have a workload on DigitalOcean you can now you don't have to manage your own service offering on Compute. You can take advantage of these things. It's great. I'd like to see more competition in this marketplace and I don't know why I root for DigitalOcean every time. I don't know why. [01:20:02] Speaker A: Yeah, I mean I root for them too. I really should just move the podcast website there because Amazon, I feel like I keep paying them more and more for not a lot more service. So yeah, Anyways, all right, well let's move on to a cloud journey this week. So this one scratch my itch for things that I thought was a cool concept and idea and so I thought you might enjoy this one Ryan. I try to tailor them to our hosts. This one felt right up your alley. So basically this was an article in the Business Insider, I believe. Yes. That basically talked about how Amazon overcame their GPU crunch, which was interesting. So basically if you have lived under a rock, you may not know that GPUs have been very difficult to get hold of for many years. And really in the last six months there's been kind of a relief to that that GPU has gotten a little bit more available. Not 100% better, but you know, definitely better. But basically this became a big problem for Amazon retail because it couldn't get enough GPUs to power its crucial inference and training workloads and it was delaying projects and things they wanted to launch for the Amazon store or for their internal processes that would leverage AI. And so they set out to revamp internal processes and technology to solve the problem. Because of course a good doom process always solves all problems. The solution was a project Greenland which was a centralized GPU capacity pool to better manage allocates limited GPU supply. The document that they wrote that somehow business center got hold of. I don't know how they did it, I don't Want to know. GPUs are too valuable to given out on a first come first serve basis. Instead, distribution should be determined based on roi, layered with common sense considerations and provided for long term growth of the company's free cash flow per the internal guidelines. Two years since the shortage began, GPUs remain scarce, but Amazon's efforts to tackle the problem may be paying off, with internal forecasts suggesting the crunch would ease this year's cheap availability expected to further improve. That's from document which is already seeing some relief there, although the tariffs are about to screw it all up. So probably still going to use this for a while. When queried, Amazon did say Amazon has ample GPU capacity to continue innovating for our retail business and other customers across the company. ADIS recognized early on that generative AI innovations are fueling rapid adoption of cloud computing services for all customers including Amazon. And we quickly evaluate our customers growing GPU needs and took steps to deliver the capacity they need to drive innovation. Thank you for that non answer Amazon. I appreciate it. Anyways, so basically the guidelines say Amazon demands hard data and return on investment and Proof for all internal GPU requests. Initiatives have to be prioritized and ranked for GPU allocation based on several factors including the completeness of data provided and the financial benefit per gpu. Projects must be shovel ready or approved for development and prove they are competitive in the race to market. They also must provide a timeline for when benefits are expected to be realized. Oh my God. A business case. If your system doesn't provide the return on investment by the time you said it was supposed to, the GPUs could be redistributed for the next project program that needs it. This whole thing is codified into a process they call the official tenants or internal guidelines that individual teams or projects create for faster decision making. And the tenants emphasize a strong return on investment, selective approvals and push for speed and accuracy. And there's eight of them that they were able to get hold of. So first one ROI and high judgment thinking is required for GPU usage prioritization. GPUs are too valuable to be given out on a first come first server basis. Instead distribution should be determined based on ROI layered with common sense considerations providing for long term growth of companies for cash flow. Distribution can happen in bespoke infrastructure or in hours of a sharing pooling tool. That seems reasonable. We talked about a little bit ago. Ryan, any thoughts? [01:23:28] Speaker C: I mean you know, like it's. It's funny to me because I read through these things and like I have two thoughts at the same time. I was like what a pain in the ass it must be to work at Amazon. And then also oh my God, they put these processes in place that are just make so much sense to me. Like you have to have data, you have to have numbers and a justification and if that doesn't bear fruit over time the business will change direction versus having like shelfware or just keep pouring. [01:23:54] Speaker A: Money into a hole that doesn't go anywhere. Yeah, it's so awesome novel, right? All right, I'll give you a couple more here that you can comment on. Number two, continuously learn, assess and improve. We solicit new ideas based on continuous review and are willing to improve our approach as we learn more. Makes sense. [01:24:11] Speaker C: Yeah. [01:24:12] Speaker A: Number three, avoid silo decisions. Avoid making decisions in isolation. Instead centralize the tracking of GPUs and GPU related initiatives in one place so we have overview of it. So those two seem to make some sense. What do you think of those? [01:24:24] Speaker C: I mean it's great because they're hinting at having transparency in the business process and so this is visibility of your Inventory across, you know, that's visible to more than just the people that are actually racking and stacking or making these GPUs available. So it's like you have to have that, those channels of communication, that visualization or make it work. So absolutely, yeah. [01:24:46] Speaker A: All right, the next one. Time is critical. Scalable tooling is a key to moving fast when making distribution decisions, which in turn allows more time for innovation and learning. From our experience, that just means they're. [01:24:56] Speaker C: Going to automatically take them away from you. [01:24:57] Speaker A: Yeah. You're not using your GPUs and you're not making quick decisions. They're going to go away. That's what they're telling you. So efficiency feeds innovation. Efficiency paves the way for innovation by encouraging optimal resource utilization, fostering collaboration and resource sharing. [01:25:14] Speaker C: You have to make it visible so that you can, like, haggle with your peers. [01:25:18] Speaker A: Well, you're only getting this ROI, and I think I can get this ROI. [01:25:21] Speaker C: Or, hey, can I borrow your GPUs? [01:25:25] Speaker A: Next one. Embrace risk in the pursuit of innovation. Acceptable level of risk tolerance will allow to embrace the idea of failing fast and maintain an environment conducive to research and development. So. So your idea failed. Get over it. Give the GPUs back. [01:25:36] Speaker C: Yeah, yeah, we've taken it away. Stop complaining. [01:25:39] Speaker A: Yeah. Transparency and confidentiality. We encourage transparency around the GPU allocation methodology through education updates on the wiki, while applying confidentiality around sensitive information on R and D&ROI shareable with only limited stakeholders. We celebrate wins and share lessons learned broadly so others can benefit. [01:25:55] Speaker C: That one's going to sting because that just means that we're going to tell some people why we took them away, but not everyone. [01:26:00] Speaker A: Yeah, maybe the ROI data they'll probably share, but yeah, maybe what they use it for, they're not going to share. Which makes sense. You don't, you know, not everything should be known by everybody, but, you know. Yeah, sums up. Makes sense. [01:26:12] Speaker C: No, yeah, exactly. It's. I mean, this just goes to show you how in today's modern business, how hard these things are, you have to provide information, transparency in order to be efficient. But also you, you're. You can't put everything at risk to insider trading and all the other risks that you have by communicating about, you know, dollar figures and stuff. [01:26:30] Speaker A: Yep, exactly. And then finally, my favorite one, GPU is previously given to fleets. May be recalled if other initiatives shows more value. Having a GPU doesn't mean you'll get to keep it. [01:26:40] Speaker C: Yeah. Even if you're meeting the numbers. You advertise there's a better idea. Sorry buddy. [01:26:46] Speaker A: Yeah, so basically to kind of codify all of this, they built a system they described as a centralized GPU orchestration platform to share GPU capacity across teams and maximize utilization. It can track GPU usage per initiative, share idle servers, implement clawbacks to reallocate chips to more urgent projects, as we just talked about. And the system also simplifies networking, setup security updates while alerting employees and leaders to projects with low GPU usage. And so it's interesting, I didn't put in my note here, but you know, you had to write your app to allow the GPU to get stolen. So like, oh, you need to expect that we're like just like expecting for failure. Expect the GPU won't be there when you need it and so fail gracefully. [01:27:24] Speaker C: Yeah, I mean, I mean I imagine this is tailored towards workloads where that makes a lot more sense, right? Where you're going out and training a specific, across a specific data set and you know, so it's less like a runtime removing of resource. But I mean, but maybe with the more and more GPUs being used for inference, maybe I'm wrong on that, but I don't know. Like this is the type of, like this is the type of solution I love to put in place with businesses, right, because it's, it's a, it's a combination of data and visualization plus automation that allows, you know, to manage an infrastructure level service and product in a way that actually achieves what the business needs versus how so many processes we have across businesses today are just least effort or you know, just what we could pull out in the time we were given. So I love this. [01:28:12] Speaker A: Yeah, me too. I mean I was reading through this, I was like, I wonder if I can make a company out of this. And I was like, no, because Amazon just released this and like I was thinking about it actually. [01:28:21] Speaker C: But then you have to, then you have to have all the GPUs. [01:28:23] Speaker A: Yeah, well, I mean I'm just thinking about the, you could build the orchestration layer that would do this kind of management and then you know, you'd have to get the allocation of that. But the reality is, I think Amazon's actually released some of this to us. Some of the tools like GPU, you know, being able to access GPUs across any region that has availability and you know, being able to use GPU fleets and things like that. So I think some of this has kind of leaked into Amazon services in A couple areas. There's also some things here they don't have today. So, you know, keep that in mind if you're reinvent predictions because I, I think this is something that, you know, companies more broadly could use and I think it's a great approach to a resource constraint problem. And like, you know, this is something, you know, you have in your business in all kinds of different ways. Not just GPUs, but dollars spent on cloud. Like you have limited dollars and if your thing doesn't work, like, you should be able to yank it as appropriate. Especially in R and D environments, sense. [01:29:15] Speaker C: Exactly. I mean, how many compute nodes have you found just idling there doing nothing? Because there was an idea and there was a lot of excitement about that idea for a short period of time, but it never beared fruit and now it's just sitting there because no one cleaned it up. Because you're not excited about that thing having a way to automate that report on it. Fantastic. [01:29:35] Speaker A: Well, and anytime a company's innovating and you're like, oh, well, we release this feature, we think it's going to be great and it's going to create all this revenue and it's like, okay, well you've been selling it for X number of years or months and you're not hitting your targets. Like, you cut bait on these things and just move on. Like healthy companies do. They're like, hey, that idea didn't work. We're gonna exit this area because it's wasting money and wasting time and resources. And so it's a sign of a healthy business. So to do it at this level, an R and D research project is super important because R and D projects are, you know, one out of 100 is going to be successful. Like, I mean, that's idea. You're experimenting very quickly on a bunch of different AI projects because you don't know which ones are going to stick or which ones are actually a value. You have a hypothesis this, prove it out. If it works out cool, we'll build it and we'll have its return on investment and it'll show value. And if it doesn't, then we shouldn't do this thing. So it's very healthy practice. And yeah, you know, Amazon has a lot of issues as a company, but, you know, some of these things like this make me really excited. So I was glad we were able to talk about this one. [01:30:35] Speaker C: Yeah, I mean, this is, I, I've never worked at Amazon. I do hear horror stories, but I also hear, I hear some Nice things that I think cater to at least oddballs like that really want things to make sense. And so, like, I like projects and initiatives that there's a ton of business justification and knowledge and, you know, time spent on why we're doing this in a full understanding across everyone working on it and why this is going on. Because a lot of these things happen, but they happen in different silos of the business and it's not visible to everyone. And this really just programmatically puts it in to every level. Right. Like, if you're deploying something or if you're trying to kick off something, you're going to have a dashboard. And it's like you put in a number and there will be a graph that says you're on track for making it or not. And I like that, putting those things in there because I think a lot of businesses don't do a great job managing all that. There's a lot of waste. [01:31:33] Speaker A: There is a lot of waste for sure. All right, well, that is it for our show this week, Ryan. I think we've driven everything to ground. [01:31:42] Speaker C: It's a pack week in class. [01:31:43] Speaker A: It's a very busy week. [01:31:44] Speaker C: Every time we think it's gonna be a short show, it's not. [01:31:46] Speaker A: It's never. Yeah, yeah. But, you know, we. We kind of have a nice average going right around an hour, but this one actually went a little long, which it's funny because normally when we have three of us, we go even longer. So what would the show have been if Matt was here? I don't know. Yeah. All right, well, we'll let people get to their busy lives. We'll see you next week here in the Cloud, Ryan. [01:32:04] Speaker C: All right, bye, everybody. [01:32:07] Speaker B: And that's all for this week in Cloud. We'd like to thank our sponsor, Archera. Be sure to click the link in our show notes to learn more about their services. While you're at it, head over to our [email protected] where you can subscribe to our newsletter, join our Slack community, send us your feedback and ask any questions you might have. Thanks for listening and we'll catch you on the next episode. [01:32:31] Speaker A: Sam.

Other Episodes

Episode 190

November 25, 2022 00:35:27
Episode Cover

190: Finally a Crowdsourced re:Invent Prediction Show

RE:INVENT NOTICE Jonathan, Ryan and Justin will be live streaming the major keynotes starting Monday Night, followed by Adam’s keynote on Tuesday, Swami’s keynote...

Listen

Episode

July 29, 2019 34m51s
Episode Cover

The Cloud Pod placed outside the cloud magic quadrant – Ep 32

Gartner releases the new magic quadrant for IaaC and PaaS Cloud providers and Amazon continues to dominate.  AT&T gets busy with the cloud, Google...

Listen

Episode 120

June 12, 2021 00:59:49
Episode Cover

120: The Cloud Pod crosses the data streams

This week on The Cloud Pod, apparently there was a machine learning conference because there is A LOT of machine learning news. For the...

Listen