[00:00:00] Speaker A: Foreign.
[00:00:08] Speaker B: Forecast is always cloudy. We talk weekly about all things aws, GCP and Azure.
[00:00:14] Speaker A: We are your hosts, Justin, Jonathan, Ryan and Matthew.
[00:00:18] Speaker C: Episode 328 recorded for October 28, 2025.
[00:00:23] Speaker D: Shh.
[00:00:24] Speaker C: It's a secret region.
Good evening, Ryan and Matt. How are you guys doing?
[00:00:28] Speaker D: Been better, how about you?
[00:00:30] Speaker C: You know, Amazon didn't craft the bed today, but I guess Azure technically did since we are recording on Wednesday.
[00:00:36] Speaker D: I'm just saying it's your guys's turn next week. So, you know, good luck Google.
[00:00:41] Speaker C: You know, it's been a while since, you know, Corey Quinn reappeared back on Twitter recently and so I've been sort of enjoying his comments again and he's like, so all the, all those people saying multi cloud, so now you get to do these major outages across every cloud.
So Last week was AWS's turn and this week is Azure's turn. And so if you're doing multi cloud, you just had two, seven one outages in two weeks. Good luck to you.
I was like, yeah, that is true. I hadn't really thought about it from that perspective. Yeah, I mean, hopefully you don't have that many 7.1 outages from your providers this close together, but it does, it does unfortunately happen sometimes.
[00:01:16] Speaker D: I still like your comment of look at what two companies are about to have their big conferences and definitely releasing features in the back end and guess which two just had to be a strategist.
[00:01:26] Speaker A: Really?
[00:01:26] Speaker D: Dude, I don't know. We'll find out.
[00:01:28] Speaker C: Related. Never. Never.
We have our GCP fund in March heading into April. Yeah, this is when you guys have all your fun. So I mean the problem with Microsoft though is they do have two conferences because they have Build and they have Ignite and both of them are feature drops. So Azure gets twice as lucky or.
[00:01:47] Speaker D: I still feel like Build is a little bit more developer tools. Oh, it is for like GitHub and stuff like that.
[00:01:53] Speaker C: So yeah, I mean technically Google has Google IO I think it is, which is more on the mobile and developer side as well.
But they, you know, they do occasionally drop some, some features and capabilities for those things, so.
All right, well let's get into what's happening here in the cloud. First up, Fastly is now dropping monthly DDoS reports, just like Cloudflare does and as well as AWS. And their report reveals a notable 15.5 million requests per second attack that lasted over an hour, demonstrating modern application layer attacks can sustain extreme throughput with real HTTP requests rather than simple pings or amplification techniques. Attack volumes in September did drop 61% of August levels, with data stressing a correlation between school schedules and attack frequency during those children.
Media and entertainment companies face the highest median attack sizes followed by education and high technology sectors. With 71% of September's peak attack day attributed to a single enterprise media company.
This is saying 15 million RPS attack originated from a single cloud provider, ASN. Using sophisticated daemons that mimic browser characteristics, making detection more challenging than typical DDoS patterns, organizations should evaluate whether their incident response runbooks can handle hour long attacks at 15 plus million RPS. And the answer is no. No, it cannot.
[00:03:05] Speaker D: No.
[00:03:05] Speaker C: As these sustained high throughput attacks require automated mitigation rather than manual intervention. If someone wants to send 15 million requests to the cloud pod, I'm just turn off the website for the day.
[00:03:13] Speaker D: So just let it crash, it's fine.
[00:03:16] Speaker C: Yeah, I mean technically the. The podcast episodes themselves are hosted elsewhere, so if you take the website down, I'm not really going to give a crap because people can still get the episodes. You can't silence us that easily.
[00:03:26] Speaker A: Yeah, well, and it should be behind.
What is it? Amazon Shield, whatever. Like because.
[00:03:34] Speaker C: Which is automatically cloud front and WAF protection and cloud. What are. Yeah, whatever. The.
[00:03:39] Speaker A: No, there's a DDoS thing that's automatically attached to that load balance.
[00:03:41] Speaker C: Shield. Shield. And I don't pay for SHIELD advance though, because it's expensive.
[00:03:44] Speaker A: Yeah, no, you don't need shill advance, but the basic stuff should capture from, you know, basic DDoS. Although, what am I saying? I don't want to invite this on.
[00:03:52] Speaker C: Yeah, this is not.
[00:03:53] Speaker A: Take it all down.
[00:03:54] Speaker D: I do because it would mean somebody's listening to us and we're just not babbling into the void. So you know, I mean I have.
[00:04:00] Speaker C: Enough people who've sought me at conferences know that we do get listened to. It's a little weird when that happens, but it's fun. It's like, oh nice. And you thank them for listening and move on with your world. Finops X was last time that happened. So. But someone invited me to reinvent and I was like, why would I go to reinvent? They're like for all the cloud stuff. And I'm like, yeah, I'm not sure there's drinking. And I'm like, okay, you have me drinking.
[00:04:23] Speaker D: But yeah, but then you got Vegas on the other side for four days.
[00:04:27] Speaker C: Yeah. Yeah. I mean I could maybe if I was just going drinking in Vegas during a rain event, I would just go for like Tuesday and come back Wednesday. It wouldn't even I wouldn't even make it more than a full 24 hours, I don't think.
[00:04:39] Speaker D: Just don't even book a hotel room. It's fine.
[00:04:41] Speaker C: Exactly. Who needs a hotel for that? So let's move on to AI is how ML makes money this week first up, Google AI Studio is introducing Vibe coding. I mean, you can't really introduce Vibe coding at this point. It's already exists.
They're adding to the noise of Vibe coding is maybe how I wrote that. With a new AI powered development experience that generates working multimodal apps from natural language prompts without requiring API key management or manual service integration. The platform now automatically connects appropriate models and APIs based on app descriptions, supporting capabilities like Veo for video generation, Nano Banana for image editing and and Google Search for source verifications. The new annotation mode enables visual app modifications by highlighting UI elements and describing changes in plain language rather than editing code directly. The updated app gallery provides visual examples of examples of Gemini powered applications with instant preview, starter code access and remix capabilities for rapid prototyping. Users can add personal API keys continue development when free tier quotas are exhausted, with automatic switching back to free tier upon your renewal.
So there are still API keys. They made it sound like there wasn't, but there is. You just don't have to manage them until you've consumed your free tier.
[00:05:50] Speaker A: Yeah, it's so funny because I, I've noticed that I can't take an article seriously when they're using Vibe coding unironically. Like I, I just, it's something about it. And like, even though my daily, you know, sort of use case with AI is sort of moving more and more to that Vibe coding model. Although I feel like what I do is a little bit more tactical and pointed. But maybe that's just me trying to be smug and just. I can't stand the Vibe coding. Like the way that it's talked about.
[00:06:22] Speaker D: I mean, I feel like it's talked about like, you know, someone that has no idea how to code is doing it versus I feel like the way the three of us and Jonathan use it is more like, okay, go do this stuff and then we can at least get in the weeds and kind of directed and helpless. So, you know, when it just keeps setting, keeps going left, we can at least notice, steer it to the right because we at least have general software development practices. But I feel like Vibe coding in general is just straight write me a.
[00:06:50] Speaker A: Web app that does this.
[00:06:52] Speaker C: It reminds me of the worst part of the Silicon Valley culture of technology is the bro coders. Like, vibe coding, like, very closely aligns my mind of bro coders. And so that's, I think, where probably Ryan's aversion to the word comes from. But yeah, I do think when we talk about vibe coding here, typically we're talking about more about, you know, being thoughtful about your prompts and breaking down your work into small pieces and then the AI does magic around that. But, you know, we're not just saying, like, create me a mobile app that's going to take over the world and then, you know, turn around three days later and there's a vibe coded thing that doesn't actually do anything. Yeah.
[00:07:28] Speaker A: And you know, in my head I remember, like when vive coding was sort of announced as like a practice and it was like there were some really terrible things that they didn't. Like. Like, you don't do feature additions. Just start over.
[00:07:39] Speaker D: Right.
[00:07:40] Speaker C: You know, that kind of thing. I remember that. That was brutal.
[00:07:43] Speaker A: Like, there's a couple things like that which I don't know. You know, like I. I don't know how real web coding is first because I don't care. And you know, like, if there is more prescribed.
[00:07:53] Speaker C: Yeah, you do it almost every day, don't you? I.
[00:07:55] Speaker A: Well, that's. So that's. No, I don't. And I will adamantly deny that I do vibe coding. It's just my day to day work life has become a lot closer to that description than I'd like.
[00:08:08] Speaker D: Don't look over here. Yeah, I'm not here.
[00:08:10] Speaker A: Yeah, the amount of code I'm actually writing, like, like typing my hand versus just reviewing and getting prompts has gone down dramatically. Like, I still do. Like, and I review line by line for sure.
[00:08:24] Speaker C: But it's.
[00:08:25] Speaker A: Yeah. Like I find myself getting lazier.
Like, go edit this config file.
[00:08:32] Speaker D: Yeah, here's a config. It does not work. Please tell me how to fix this. There's no variable. It should be this.
[00:08:38] Speaker A: Yeah, go make sure it's not everywhere I do it under that guy says, no, it's refactoring. No, it's not. I know where exactly where it is since one place and I'm just too lazy to go find it and fix it.
[00:08:46] Speaker D: I was playing with a new Azure service I'd never used before, and I was like, I need to do this. And then like, I was like, generate me terraform. But then I was like, I don't even understand if this terraform is correct because I don't really understand how to use this service yet.
And like, that's where I feel like a lot of people are like, do this thing. And I'm like, okay. But like, you don't know if what it's doing is correct or not.
Like, I actually like had to learn the service that I could help guide it more. It was still like 65% correct, but obviously it launched the super premium and wherever they called it and so, you know, whatever.
[00:09:20] Speaker C: All right. OpenAI is launching company knowledge for chat GPT, business enterprise and the education plans, nailing direct integration with corporate data sources including Slack, SharePoint, Google Drive Teams and Outlook, notably excluding OneDrive, which could impact Microsoft heavy organizations. So weird. I think I've heard of this. It's called Microsoft 365.
This feature requires manual activation for each conversation and lacks capabilities like web search, image generation or graph creation when enabled. Unlike Microsoft 365 Copilot's deeper integration across Microsoft apps, ChatGPT business pricing at $25 per user per month undercuts Microsoft 365 Copilot's three $30 per month. Oh, $5 please. Potentially offering a more cost effective enterprise AI assistant option with stronger brand recognition security. Implementation includes individual authentication per connector, encryption of all data, no training on corporate data, and an enterprise compliance API for conversational log review and regulatory reporting and data residency and processing. Locations vary by connector with no clear documentation from OpenAI requiring organizations to verify compliance requirements before their deployment. So I mean, everyone is trying to do the same thing. Everyone wants to create the AI of knowledge data inside of enterprises. And so ChatGPT doing this makes sense because they're also sort of going through a divorce with Microsoft at the moment. And so in some future world state, there may not be the partnership between OpenAI and Microsoft. But you know, I'm not surprised that they're doing this because Gemini is doing it, everyone's doing this, et cetera.
[00:10:49] Speaker A: And it's a huge problem, has been a huge problem that people have been trying to solve for a while, you know, with whether it was a, you know, enterprise only search function.
[00:10:57] Speaker C: Kendra or Amazon.
[00:10:58] Speaker A: Kendra, you know, like it is the.
[00:11:00] Speaker D: Google Google corporate search. The one you server that you had back in the day too, that would scan all your internal stuff.
[00:11:06] Speaker C: The search appliance. Yeah, I remember that search appliance.
[00:11:09] Speaker D: Yeah, like that's, that's what I feel like everyone's trying to solve again the same problem we've been solving for 15 years.
[00:11:16] Speaker A: Yeah.
[00:11:17] Speaker C: And you know, I do think that.
[00:11:18] Speaker A: You know, this wave of AI is the closest we've come to it, but it still suffers the same problem, you know, which is like there's caveats to integrating with all these services. There's a bunch of issues with authentication and managing that. You know, the fact that this has to authenticate for every conversation, like that's a chore.
[00:11:37] Speaker C: Well, and that's, that's a big problem is that you need, you need your model to be aware of the data that is in these documents.
So that way when you're having the chat conversation, it can find the sources of it. But if you're not supposed to have access to those documents, then things get tricky. And so it's like, well, the model now has data that it should, that it shouldn't give to these employees. Like, you know, I've heard a horror story about someone accidentally had a Excel file of all of their salaries on it. And you know, this model picked it up and trained it. And then you could literally ask like what Bob made in accounting and it would tell you.
And so like, you know, these are problems that you have to solve. And it's a really complicated problem when you think about authorization.
And so a lot of the solutions to these is that each person uses their own personal credentials to be able to access the thing. But then that brings up more costs.
[00:12:28] Speaker A: Which is complicated and not, not as good results.
[00:12:32] Speaker C: Right?
[00:12:32] Speaker A: Because then everything that's contextual is just a local rag that's pretty ephemeral. And it's not as good as like a, you know, a model that's trained on the, the specific data. But you know, like it's, and there's, it's all kinds of issues too, right? Because our, our largely, our authentication and authorization schemas are group based.
And so when you're approaching it from sort of a access, accessing data from like a chat thing, it's not really something that you know in advance that you can say, oh, I have access to this. And so our, our applications that manage identity don't have the attributes needed to make these logical decisions or we, or they're not configured.
So it's like it's passing a lot of custom claims if you do SSO and configuring all that which isn't in place, it's pretty difficult.
[00:13:23] Speaker D: Also making sure all your data is actually set up properly and properly classified and everything along those lines because all those fun things that like you have to do as part of ISO, you have to make sure they're set up correctly. And if they are, then you're good. But if you're not, you know, then you're gonna leak data.
[00:13:42] Speaker A: Just notice that it integrates with SharePoint but not OneDrive.
That's an interesting decision.
[00:13:49] Speaker C: It is interesting. I don't really understand. I mean, other than the Access model on OneDrive is a little weird, so.
[00:13:55] Speaker A: But it's so SharePoint. It's awful.
[00:13:58] Speaker C: Anyway.
All right, well, we were just talking about Microsoft and OpenAI and their future and they apparently have settled their differences and have now agreed to Alumini payments. Microsoft and OpenAI have restructured their partnership with Microsoft now holding approximately 27% stake in OpenAI's new Public Benefit Corporation valued at $135 billion, while maintaining exclusive Azure API access and IP rights until AGI is achieved.
The agreement introduced an independent expert panel to verify AGI declarations and extends Microsoft's IP rights for models and products through 2032, including post AGI models with safety guardrails through the research IP expires by 2030 or AGI verification. OpenAI gains significant operational flexibility, including the ability to develop non API products with third parties on any cloud provider, releases open weight models meeting capability Criteria and serve U.S. government National security customers on any cloud infrastructure they choose. Microsoft can now independently pursue AGI development alone or with partners and if using OpenAI's IP PRE, AGI must adhere to compute thresholds significantly larger than current leading model training systems. OpenAI has committed to purchasing 250 billion in Azure services while Microsoft loses its right to first refusal as OpenAI's compute provider seemingly shift towards more independent operations for both companies.
So that's a pretty big deal. And there's a blog post in the show notes for both from Microsoft and from OpenAI so you can see how each of them is spinning this. OpenAI does mention in their argument that the revenue sharing agreement continues with AGI verification, but payments will be distributed over a longer time frame while Microsoft retains exclusive rights to the Frontier models. So, you know, limited once AGI is achieved is an interesting choice.
[00:15:37] Speaker A: That is an interesting choice.
[00:15:40] Speaker C: I, I wonder how Microsoft doesn't believe that's going to happen very soon and OpenAI doesn't. That's why they're willing to agree on that term. Or it's interesting again, it has to be independently, you know, independently verified by a partner to say, you know, so OpenAI can just come out and say we've created AGI then you know, into a legal dispute has to be agreed upon by others. That's all very interesting.
[00:16:01] Speaker A: I wonder if Open API is being like protective of it because they think, you know, like I think it will be sort of game changing and no one really knows how. So they're sort of sort of laying out their, their, the, the boundaries for that. Or, or if it is just Microsoft making their choice because it's. Yeah, I can see, I can see it either way. But I mean I also look forward to this, this agreement changing in three weeks like it did last time. So like I can't keep up with.
[00:16:29] Speaker C: Well, I mean, I mean I think this now allows them to go ipo, doesn't it? On the public traded company or the public interest company or whatever that is. So I, I'm sure there'll be lots of further discussions and then there's a disclosure if they're going to try to go public or ipo. Although I think it's probably early for them to ipo, but we'll see.
And if they actually achieve true AGI IPO is kind of dumb because no one will have any money to buy anything. It's all be unemployed because AGI is.
[00:16:55] Speaker A: Doing all the work if not converted to batteries that are just being powered. Just powering AI data centers.
[00:17:06] Speaker D: Too soon I think.
[00:17:08] Speaker A: I guess.
[00:17:08] Speaker D: Yeah, I'm trying to figure out where to go with that.
[00:17:11] Speaker A: I mean the movie came out in 99, man.
[00:17:14] Speaker C: Yeah, I'm not looking forward to that future, but okay.
AWS is announcing general availability of Web Grounding for Amazon Nova Premier, a built in rag tool that automatically retrieves and cites current web information during inference. The feature eliminates the need to build custom RAG pipelines while reducing hallucinations through automatic source attribution and verification.
And the web grounding operates as a system tool within the bedrock Converse API, allowing Nova models to intelligently determine when to query external sources based on prompt context. Developers simply add Nova Grounding to the tool config parameter and the model handles retrieval, integration and citation of public web sources automatically. Features available to you in US East North Virginia and Ohio and Oregon regions. Coming soon. Primary use cases include knowledge based chat assistance requiring current information, content generation, tools needing fact checking, research applications synthesizing multiple sources, and customer support where accuracy and verifiable citations.
I mean, this is the first time I've heard anything about NOVA in months.
So good to know unless they laid all those people off this week. But finally, glad to see some Nova enhancements happening.
[00:18:20] Speaker D: I forgot it existed.
[00:18:22] Speaker A: Yeah, I always forget it existed because I don't use it day to day. I mean I do like this feature and I do think it's great that it's just like a config option you can turn on. I don't like it enough to go research the pricing model because I was too lazy because it is extra on top of your your original inference call. So it is sort of interesting there and I I wonder.
It seems odd to me but I don't know really to how the the pricing model on these things work per million tokens.
[00:18:53] Speaker C: That's how it all works.
[00:18:54] Speaker A: Still don't know. You know what.
[00:18:58] Speaker C: So hard to calculate and Cloud Tools this week Harness is introducing AI powered database migration authoring that lets developers describe schema changes in plain English like create a table named animals with columns for genus species and automatically generate production ready SQL migration with rollback scripts and get integration. The tool addresses the AI velocity paradox where 63% of organizations ship code faster with AI, but 72% have suffered production incidents from AI generated code. By extending AI automation to database changes which remain a manual bottleneck in most CICD pipelines or it results in you just dropping more database tables than you really wanted to and causing more outages. But you know, whatever this is built on harnesses software delivery Knowledge Graph and MCP server. It analyzes the current schemas, generates backwards compatible integrations and validates for compliance and integrates with existing policy as code governance database. DevOps is one of Harness's fastest growing modules. With customers like Athena Health Reporting, they save months of engineering effort compared to liquid based Pro or homegrown solutions while getting better governance and visibility.
[00:19:56] Speaker A: What could go wrong?
[00:19:58] Speaker D: So I get to just say migrate me from Microsoft SQL to Postgres and it will magically do it.
[00:20:04] Speaker C: That's what they say.
Sure.
[00:20:07] Speaker D: Yeah.
[00:20:08] Speaker A: I mean given how hard this is for for humans to do, I look forward to AI doing it better.
But you know like it is going like the devil's always going to be in the details of these things. Like changing it and then changing it at scale is, is tricky. So it's like great for deployment or development, you know, and you're creating something new and need to make little changes but larger changes. Like how would you do this?
[00:20:33] Speaker C: I. I mean Liquibase is also not easy. So anything to make that sort of an easier process I think is a plus.
Again, like you're, you're very much drinking the Kool Aid of Harness at this point if you're in this model.
[00:20:46] Speaker A: Oh yeah, I forgot this is a Harness or release. Yeah. So this is.
Yeah, yeah, scary.
[00:20:56] Speaker C: Moving on to AWS reports or an unverified report claims Amazon replaced 40% of their DevOps staff of AI systems capable of automatically fixing IAM permissions, rebuilding VPC configurations and rolling back failed Lambda deployments. Though AWS has not confirmed this and skepticism remains high, the timing coin size of the recent hours that impacted major services across the Internet. And AWS officially laid off hundreds of employees in July and also this week. 14,000 of them in fact. But the alleged 40% downstream would represent a significant shift towards AI driven infrastructure management, if true. Incident highlights growing concerns about cloud servers concentration of risk, as both the AWS outage and the 2023 Windows CrowdStrike incident demonstrated how single points of failure can impact thousands of businesses.
I mean, in general, Amazon's been doing a lot of layoffs. They've had a lot of brain drain. I don't know that they automated 40% of their DevOps staff with AI systems systems. I also don't know that they actually have that many people who actually have a DevOps title as well. So this one is a little bit rumory and a little speculative, but I did find it fun that people were trying to blame AI for Amazon's woes. Last week I was kind of getting.
[00:22:01] Speaker A: The impression that some of the metrics were sort of blended between Amazon and AWS because, like, I don't think AWS laid off 14,000 people.
I'm like, that's a little much.
[00:22:15] Speaker C: I mean, they have, I mean they have a massive workforce. I mean they, they did lay off 2300 per the war notice in Seattle. That's just one state.
[00:22:22] Speaker A: So I mean they have, yeah, offices, thousands make sense. But 14,000 or. Yeah, 14.
[00:22:26] Speaker C: I mean, the rumor is that they're going to do 30,000 total for just AWS. No, for Amazon as a whole.
[00:22:33] Speaker A: No, that's what I'm saying. Like I, I believe that these are valid numbers for Amazon, just not.
[00:22:38] Speaker C: Oh yeah, I see what you're saying. I mean, I do wonder.
I did see someone, some rumors on Twitter. They're like, well, Amazon AWS wasn't really impacted because they don't want to. They want to risk reinvent. So all those layoffs will happen after reinvent. I was like, I hope not, because that's a terrible Christmas present.
So I hope that's not true. But.
[00:22:56] Speaker A: Well, it depends on, you know, what the packages are too. Right? You know, if you're just paying people to be off.
[00:23:01] Speaker C: Yeah, I mean, I don't know what the packages are.
[00:23:02] Speaker A: I mean, it doesn't help, right? Like buying, you know, with planning for the future and trying to buy you know, presents. We're thinking you're not going to have a job come February or whatever it is.
[00:23:12] Speaker C: Yeah, no, that's not a. Not ideal scenario. I mean layoffs, you know, this year, you know, 218 tech companies, 112,000 tech companies laid off, allegedly, which is less than 2024 and less than 2023. But you know, the numbers you see going out there are just crazy. So I, I am not entirely sure that layoffs, FYI is even up to date on all the layoffs that happen. A lot of them are being done in ways that avoid warn notices as well. Yeah.
[00:23:40] Speaker A: Or much quieter.
[00:23:41] Speaker C: Yeah, yeah. So it's going to be interesting to see, you know, the government shut down right now. We don't get jobs numbers that we can trust for a while. So be curious once the government does reopen, what we see these numbers actually looking like. I did see interest rates went down today though, so that's a win.
Well, Amazon has reduced, producing a report that does not say that AI replaced the DevOps people, which is why they had an outage instead they blame it on DNS.
[00:24:06] Speaker A: Duh.
[00:24:08] Speaker C: DynamoDB, they say, experienced a 2.5 hour outage in US East 1 due to a race condition and its DNS management system that resulted in empty DNS records affecting all services dependent on DynamoDB, including EC2 lambda and redshift. A cascading failure pattern showed how tightly coupled Amazon services are like EC2 instance failures. Launches failed for 14 hours before DynamoDB's outage prevented lease renewals between EC2's Droplet Workflow Manager and physical servers. Network balancers experience connection errors from 5:30am to 2:09pm due to health check failures caused by EC2 network state propagation delays demonstrating infrastructure dependencies can create extended recovery times. AWS has disabled the automated DNS manager system globally and will implement velocity controls and improve throttling mechanisms before re enabling, highlighting the challenges of balancing automation with reliable resilience. Incident reveals architectural vulnerabilities and multi service dependencies. Services like Redshift in all regions failed IAM authentication due to hard coded dependencies on the US East 1 region, suggesting need for better regional isolation, which is surprising considering how much Amazon touts everything will fail and trust nothing outside the region.
[00:25:11] Speaker D: I mean the write up was good. I don't know if you. I assume you guys have read it, but walking through it like I do like the level of detail, you know, my wife yelled at me because I was sitting there reading it when it first came out that night and she's like what are you doing? I was like, this is interesting.
But I liked how detailed they went. They go into it. It's like look, there was a race condition that's been there for years around these two, around this internal service and how they talked about each availability zone has their own little setup of it.
It's a good write up to show that look, even these large cloud providers that have these massive systems and have redundancy upon redundancy upon redundancy, it's all software under the hood.
Software will eventually have a bug in it. This just happens to be a really bad bug that took down half the Internet, you know.
[00:25:57] Speaker A: So I mean it's, it's, it does suck how big these things are, but it's kind of amazing how few and far between they can be now, you know, versus the older days where it seemed much more common, especially in US East 1. But yeah, there's still going to be issues here and there.
[00:26:14] Speaker C: Well, I like that they broke down, you know, each service, you know its failure, the problems they were showing what, you know, what it took to recover that service. And this is, it is a very detailed incident report. So if you were like, well I don't use Dynamob, but I was still impacted but my network load balancer was or you know, my security token service API calls were impacted. Like they break it down by each service and what the impact was and why and what they're doing to hopefully avoid that in the future. So you know, these are always important to see, you know, what are they going to do to improve things and they do detail that as well. And you know, for all the cloud providers, you know, Amazon has always had a very strong desire to not be regionally dependent cross region and so they are typically very single region where other cloud providers have global things like global networks and other global front door services that were down this morning, et cetera, still down, still down. You understand that these trade offs in architecture design become your trade offs in your application architecture as you design your systems. And so the Amazon model of always design for failure is still very important. And even relying on a single region or multiple availability zones in a region does open you up to some risk.
[00:27:32] Speaker B: There are a lot of cloud cost management tools out there, but only Archera provides cloud commitment insurance. It sounds fancy, but it's really simple. Archera gives you the cost savings of a one or three year AWS savings plan with a commitment as short as 30 days.
If you don't use all the cloud resources you've committed to, they will literally put the money back in your bank account to cover the difference. Other cost management tools may say they offer commitment insurance, but remember to ask, will you actually give me my money back? Our Chair A will click the link in the Show Notes to check them out on the AWS Marketplace.
[00:28:11] Speaker C: All right, well, if you wanted to write your incident reports, CloudWatch is now automatically generating your post incident analysis reports by correlating telemetry data, investigation inputs and actions taken during an investigation, reducing the report creation time from hours to minutes. Now, most companies don't do those, so congratulations, you just got automated post incident analysis reports, which is kind of nice. Reports include executive summaries, event timelines, impact assessments, and actionable recommendations, helping teams identify patterns and implement preventative measures for better operational resilience. VRA integrates directly with CloudWatch investigations, capturing operational telemetry and service configurations automatically without manual data collection or correlation. It's currently available to you in 12 regions. No specific pricing has been mentioned, although I assume it'll layer on top of the existing Cloud Watch investigation costs. This addresses a common pain point where teams spend significant time manually creating incident reports instead of focusing on root cause analysis and prevention strategies.
[00:29:07] Speaker A: Every time I read these stories, I just thank my lucky stars that I make internal software and you know, the root cause is always known. It's me, I'm the root cause.
And if I, it would be fun to turn this type of service on for, for my internal services to be like, oh yeah, no, you did a deployment and then everything broke.
[00:29:25] Speaker C: Like, huh, yeah.
[00:29:26] Speaker A: Wonder.
[00:29:28] Speaker D: Yeah, I mean this is a great service, obviously, to be in the ecosystem as somebody that has to write up postmortems and things like that for my day job, you know, anything that makes my life easier. But you know, even then it's not. You can immediately ship this. Sure, a legal department or somebody has to review everything before it comes out the door.
So, but it's a good starting point. And you know, I would take this over me having to sit there and look at blank sheet of paper going, all right, how, where do I start here, guys?
[00:29:57] Speaker A: Yeah, what are the five whys like?
[00:30:02] Speaker C: Yeah, I, I, I actually always get upset when people start with five whys because it's just like if it, it only works in a single root cause scenario. And the reality is that if you are designing software well, there's never a single root cause. It's always multiple root causes. Yeah.
[00:30:20] Speaker A: Multiple things should have to fail and before the impacts are known.
[00:30:22] Speaker C: Right?
[00:30:23] Speaker D: Like, yeah, but that's why Averages take longer to fix nowadays, which is tough.
[00:30:29] Speaker C: Especially in distributed systems. Things get real fun real quick.
Customer carbon footprint tool is expanding its capability by adding additional emission categories including scope 33 all available to you now.
Fuel the scope 3 emissions data covers fuel energy related activities, IT hardware, lifecycle emissions and building equipment impacts giving customers a complete view of their carbon footprint beyond just direct operational emissions. The tool provides both location based and market based emission calculations with 38 months of historical data recalculated using the new methodology accessible through the AWS billing console with CSV export and integration options for QuickSight visualizations.
Scope 3 emissions are amortized over asset life cycles. 6 years for IT hardware, 50 years for buildings to fairly distribute embodied carbon across operational lifetime. The all calculations independently verified following GHD protocol standards. Early access customers like Salesforce, SAP and Pinterest report the grinding their regional data and Scope three visibility helps them move beyond industry averages to mark make targeted carbon reduction decisions based on actual infrastructure. Emissions tool remains free to use with AWS billing and cost management console providing emissions data and metric tons of CO2 equivalents to help organizations track progress towards sustainability goals and compliance reporting requirements. So I think we talked about this before that they were kind of avoiding the Scope three sort of felt dirty, like they were cheating the system. And so it's nice to see them come back and say no, no. We now have Scope three and we've recalculated all of our things to calculate a true scope 3 number across the 3 year window. So that's great.
[00:32:00] Speaker A: Yeah, I think that the, the detail they provide of, of how they're, you know, distributing the, the carbon impact is, you know, I think it was more or less about like cheating the system and more just it's hard and they had to figure out how to do it.
[00:32:16] Speaker D: I mean this is a difficult problem to solve. You know, once you have Scope three it's all your indirect costs.
So you know, let, I think if I remember correctly, like scope one's like your actual server. Scope two is like power and then scope three is all the things that have to get included in to generate your power and your servers, which includes shipping, et cetera. So like getting all that is not an easy task to do.
You know, even I look at the numbers and I'm like, I don't know what these mean half the time when I have to look at them I'm like we're going down. That seems positive.
[00:32:48] Speaker A: Yeah, it's just going to turn, you know, it's so complicated. It's going to turn into financial audits. Right. Like your carbon audit is going to be the same and it's going to be hard and there's going to be people that are really deep at the forensic analysis of this data is. Yeah.
[00:33:01] Speaker D: There's gonna be an ISO standard for it in the future and requires independent audits before you have your formal audit. And then we all hate lives.
[00:33:08] Speaker A: Oh God, you're right.
[00:33:09] Speaker D: Oh, you started it.
[00:33:15] Speaker A: Yeah, but I don't have to do financial audits. I have to do ISO audit.
[00:33:24] Speaker C: AWS is launching the secret west region. Shh. It's a secret. I know how to kill you all. It's second region capable of handling secret level classified workloads expanding beyond the secret existing secret east region to provide geographic redundancy for intelligence and defense agencies operating in the western United States. The region meets stringent Intelligence Community Directive ICD 503 and DoD Security Requirements Guide Impact Level 6 requirements enabling government agencies to process and analyze classified data with multiple availability zones for high availability and disaster recovery. This expansion allows agencies to deploy latency sensitive classified workloads closer to western US operations while maintaining multi region resiliency. Addressing critical gap. ADOS continues to operate in specialized market segment with limited competition as few collaborators can meet the security clearance and infrastructure requirements necessary for secret level classified hosting. Pricing information is not publicly available due to the classified nature of the service. But assume it's expensive.
[00:34:20] Speaker D: I feel like I know Azure has.
[00:34:22] Speaker C: They have a secret region too. Yeah, yeah.
[00:34:24] Speaker D: Level. I thought they also have level 10.
I think AWS does too. So is. Does GCP because I feel like then all three major cloud providers do.
[00:34:33] Speaker C: I do not believe they have an IL6 or. I think they have.
[00:34:36] Speaker A: That'll tell us.
[00:34:38] Speaker C: Yeah, they'll tell us about.
[00:34:41] Speaker D: GCP for Google Cloud achieves DOD LL6 for GVC.
[00:34:46] Speaker A: Oh so they don't, they don't do that in dedicated sites. So GCP does handle their, their isolation at a very different level than Azure and AWS in the. In that they, they don't have separate regions for these. They. They do it by sort of a continuous audit analysis of existing sort of workloads and so like on, you know, the actual infrastructure on the Google side I'm sure is all in one place and isolated at some point. But it's from a customer perspective you just sort of.
You declare that your workload is. It needs to meet this classification and it sort of manages it all under the hood and displays your compliance where, where you're good and where you're not it's pretty crazy.
[00:35:35] Speaker C: Well, and, and it's interesting because the government is basically, you know, pushed out being the year FedRamp 20x, which basically says that the way Google does it is the way they prefer to do it where they don't want to have isolated gov clouds and secret regions and they want it to be part of the commercial thing because what they identified is that, you know, the Fedramp being its own isolated environments means that those costs, you know, the government doesn't get features as fast basically. But then they, they did the FedRAMP 20X program and it's, it's better but they still don't really handle, you know, rapid change in the environment. They still have really draconian patching requirements that you know, even the government users hate. Like, you know, these companies have to take all this maintenance to do, you know, seven day SLA on, you know, certain types of defects and issues. So you know, there's a bunch of things that the government's not happy about with FedRamp and then there's, you know, attempts to make it better with FedRamp 20x. But you know, there's still a lot of long ways to go. And so these secret regions will, you know, were probably built before 20x began but you know, again government's saying they don't really want that model. So it's interesting to see how that shakes out over the next decade.
[00:36:42] Speaker D: Yeah, it'll become a local zone or something like that. You know, at one point they'll have it moved out and become a local zone or something like that.
[00:36:50] Speaker A: The leverager and it, well, it's sort of like, I don't know if I like, I think that if, you know, some of these requirements are very strict network isolation and isolation where it almost makes more sense to segregate it off. So I wonder. I, you know, like I've never done anything, I've never even looked at anything that that's like top secret or, or anything along those lines. So maybe, I don't know but. Sounds hard.
[00:37:14] Speaker D: I mean if you get really detailed into. And this is, you know, years ago when I was talking to people, you know, they have like, you know, different fiber lines that run to your desk.
[00:37:23] Speaker A: Yeah.
[00:37:24] Speaker D: And stuff like that. But I don't know if all of that's changed over the last, you know, years.
[00:37:29] Speaker A: I don't think it has. Not for, I mean, not for the, the higher crazy levels.
Yeah.
[00:37:38] Speaker C: AWS transfer family now supports changing identity provider type on the server.
This allows you to basically boot to service managed, Activatory or custom IDP on existing SFTP, FTPs and FTP servers without service interruption only the need to recreate servers during authentication migrations this feature enables zero downtime authentication migrations for organizations transitioning between identity providers or consolidating authentication systems, which is particularly useful for companies undergoing mergers or updating compliance requirements. The capability is available across all AWS regions where transfer family operates with no additional pricing beyond the standard transfer family costs, which are expensive.
Organizations can now adapt their file transfer litigation methods dynamically as business needs evolve. Thank goodness, because this was dumb. This is always dumb to me.
If they can also make it so you can, you know, add SFTP or the other types of FTPs, et cetera to existing transfer family node, that would also be great. But I'll take authentication first.
[00:38:35] Speaker A: Yeah, any kind of configuration change that requires you to destroy and recreate isn't fun. Like I do believe that, you know, we should architect for such things and and you know, be able to redirect things with DNS traffic which, you know, never goes wrong, never causes anyone any problems.
But you know, like it is, it is terrible when that happens because you're even when it works you're sort of nervously doing it the entire time.
[00:38:57] Speaker C: Well and the thing about like these legacy protocols, they all use key exchanges and known host configurations and all these things that are sort of dumb and but made sense when they first came out and now in a cloud world don't make sense. But that's part of the protocol and part this RFC spec. And so you know, rebuilding an SVP server that now all of a sudden is a new node and a new key now breaks customers in a really awful way that you couldn't avoid because you had to update to active directory integration versus Service managed authentication.
That's the part that sucked. So not having to do that is very nice.
AWS is interesting. EBS IOPS exceeded check and instance EBS throughput exceeded check metrics that return binary values 01 to indicate whether an EC2 instance is exceeding exceeded their EBS optimized performance limits, helping identify bottlenecks without manual calculations. These metrics enable automated responses through CloudWatch alarms such as triggering instance resizing or type changes when I O limits are exceeded, reducing manual intervention for performance optimizations available at no additional cost with 1 minute granularity for all Nitro based EC2 instances with attached EVS volumes across all Commercial and GovCloud regions and China. Even this addresses a common blind spot where applications experience degraded performance due to exceeding Instance level I O limits rather than volume level limits, which many users overlook when troubleshooting.
Guilty as charged. Particularly useful for database workloads and high throughput applications. We're understanding whether the bottleneck is at the instance or volume level is critical for right sizing decisions.
[00:40:26] Speaker D: This would have solved me a lot of headache when GP3 came out.
[00:40:31] Speaker C: Yes.
[00:40:32] Speaker D: And you had to like, you converted over. You had. They're like, it's cheaper. You're like, great. Then you're like, wait, I have to play with these levers. I have to understand.
You know, used to was it like 3 per 3 gigabytes per IOPS up to X whatever? And I was like, that was easy math. I can multiply by three. But now I have to like do this complicated back of the nap to see if my throughput versus my IO and everything else matches. And that, I'm not gonna lie, bit me a few times pretty hard in customer environments.
[00:40:58] Speaker C: Oh yeah.
[00:40:59] Speaker A: I mean, it's funny because it's like, I hope it does exactly what it says, but it's also like they're, they're calculating this based off of that same math. So it's as, as long as that data is clean going in, it'll be fine.
[00:41:09] Speaker C: Yeah. Maybe just even having an indicator that you could. Hey, there's something wrong with the instance level.
Even as binary 1 or 0. That's still very helpful.
[00:41:18] Speaker D: Yeah, it's a starting point to at least look at to see. Okay, three times today we hit over. So eh, maybe we're fine, but you know, it's done 20 times in the last, you know, hour.
We may have a problem here, guys. We're.
[00:41:30] Speaker C: We're on the bubble of a major problem, so.
[00:41:32] Speaker A: And maybe it's not the 27th thing I check.
[00:41:35] Speaker D: Right?
[00:41:35] Speaker A: It's. It's one of the top three things I check because there's an alarm for it now.
[00:41:38] Speaker C: Yeah, yeah, well, it was like, it was like, you know, you only had to go through, you know, how many T instance outages before you were like, oh, oh, check that. Check that quota first.
[00:41:49] Speaker D: That was one of those things I wrote a custom check for a long time ago that ran in a lambda that would go through all my T series in our environment once an hour to see if you went over it just to set the alarm. Because so many times I dealt with customers that did that.
[00:42:04] Speaker C: It has no cpu. I don't understand why it doesn't have cpu.
[00:42:07] Speaker A: It just, it's just stopped working randomly.
[00:42:09] Speaker C: Yeah, it was working just fine. Yeah, I've been there, done that. I have a similar thing with EFS right now that I'm trying to debug where like all these nodes attached to EFS volume all of a sudden just stop working at the same time. It's like the only thing common is efs.
Sure there's some limitation I'm missing I'm trying to find. So I'm gonna check this one out. Maybe it's. Maybe it's iops.
[00:42:27] Speaker A: Right.
[00:42:27] Speaker C: Instance type. But I doubt it because that wouldn't make sense for all the nodes.
[00:42:30] Speaker A: But you wish you're that lucky.
[00:42:32] Speaker C: Yeah, I know it's going to be.
[00:42:33] Speaker A: It's going to be some sort of underlying NFS lock terribleness.
[00:42:36] Speaker C: Yeah, exactly.
I know it's an NFS problem of some kind. I just don't know what it is yet. Like some, some operation is putting a lock on this thing and then screwing everything until it basically gets killed by the health track and then it won't release it.
[00:42:48] Speaker A: Yeah, I hate these issues.
[00:42:52] Speaker C: All right. GCP this week Google Cloud Parameter Manager provides centralized configuration management that separates application settings from code supporting JSON, YAML, and unfortunately formatted data with built in format validation for JSON and YAML types.
And this is a practical guide to how to use it. And I had forgotten that this existed. So today I learned that parameter store exists in Google and yeah, you should use this if you need non secret based parameters to be stored somewhere, this is the place to do it. And I'm a very heavy user parameter store on aws. I love it and you should all use it for any of your dynamic configurations, especially if you're moving containers between environments. This is, this is the bee's knees in my opinion. Yeah.
[00:43:34] Speaker A: Now this is. You know our show title could have been Cloud Pod realizes that there's a parameter manager in GCP because I had no idea it existed because I love this model and I've missed not having it since migrating out of aws. So I will probably be porting stuff over to it because I'm doing stupid things like back in the days like reading environment variables out of S3 objects. You know, not any fun.
[00:43:56] Speaker D: So many memories.
[00:43:57] Speaker A: And I do like that you can reference existing secrets in the config.
[00:44:01] Speaker C: Right.
[00:44:01] Speaker A: So you can, you can still put your secret in. In secret manager but then you're referencing it as part of their parameter.
[00:44:09] Speaker C: Yeah, which is. It is a bit of annoyance when you. You have both in AWS because you have to use different SDK portions. So it's kind of nice that you can do a pass through or you just use the parameters for secrets, which is also a thing you can do because Amazon built it twice.
[00:44:23] Speaker A: Yep, sure did.
[00:44:27] Speaker C: I guess the question now is, Matt, does Azure have a parameter store?
[00:44:33] Speaker D: Hold on.
We don't use parameter store for that. We just use Key Vault and just put it all in Secrets and you know, leverage it in that way.
[00:44:45] Speaker A: Sounds expensive.
It's good enough in Amazon. It's expensive because it's. I forget what it is. $0.40, but $0.40 per secret. But then in SSM it's like $0.10, so it's like 4x the cost.
[00:44:57] Speaker C: So from a Reddit post six years ago, Azure Key Vault secrets are really the equivalent to SSM parameter store. But if you're using app services, there's a whole section in there for storing parameters in the apps running in the app service.
[00:45:10] Speaker D: Don't do it.
[00:45:10] Speaker A: Yeah, don't do it.
[00:45:12] Speaker D: Don't do it. And there's an easy way, just like there is in aws, to link, you say here's the environment variable link directly to secret and it automatically pulls it so you don't have to deal with it. You know, it's the same thing. Just do that.
[00:45:27] Speaker C: Azure App configuration service.
[00:45:30] Speaker D: Yeah, don't do that.
[00:45:31] Speaker C: That's what they're saying is that Explore fast scalable parameter storage for app configurations. But Azure, that's what they say it is.
[00:45:38] Speaker D: I just use key vault. It's $0.03 per 10,000 transactions and it's small enough that it's fine and I know everything's secure and I don't have to deal with the flip side of somebody accidentally putting a secret in clear text because they put it in the wrong location, which I definitely have had to deal with. So everything goes in there and it's just locked down and we have a, you know, we deal with it. We have a process to input data into there in a secure manner.
[00:46:04] Speaker C: What's the cost delta between that though? Is it like Amazon? Because secrets on Amazon are outrageous.
[00:46:10] Speaker D: Secrets are like 40 cents I think on Amazon and then they charge you, I think, versus this is like it's a dollar for the key vault, if I remember correctly. And then it's for the entire vault. For the entire vault, I think.
[00:46:23] Speaker A: Okay, because yeah, the $0.40 per secret is the rough part of AWS which.
[00:46:28] Speaker C: That price has never come down, which is just crazy to me. I thought that would come down over time, but so it says so Azure Key Vault. Thanks Claude.
Always appreciate Azure Key Vault. Pricing is $0.05 per key per month. We get the first 10,000 operations free for the software protected ones. If you do the HSM ones it's a dollar per key per month.
[00:46:50] Speaker D: Yeah, those are different. Those of you need those because that's in the premium vault guys. That's the one that's HSM backed and everything else.
[00:46:57] Speaker C: And then the Azure app configuration pricing is up to 10 requests per day per store for free or with 10 megs of storage free. And then $1.20 per day for the standard tier which includes 200 million requests per month and a gigabit of storage. Additional requests are 0.06 per million.
And they say this is better for feature flag management. High volume configuration reads and dynamic configurations that change frequently where the low volume secret changes is better with key vault. So there you go.
[00:47:26] Speaker D: Per clot it's like what was that service they deprecated that was in like AWS SSM that was supposed to be like hey, we can change the feature.
[00:47:34] Speaker C: Flagging service that I just killed.
[00:47:36] Speaker D: Yeah, yeah, that one.
[00:47:37] Speaker A: Oh, config, right?
[00:47:39] Speaker C: It's part of config.
[00:47:40] Speaker D: Yeah, no, that was like part of SSM called config or something. Just confused everyone.
I mean maybe, but AWS confused.
[00:47:47] Speaker A: I will never, I will never attest to keeping or SSM services separated or knowing what they are.
[00:47:56] Speaker C: I mean there was a feature flagging feature in oh and there. And there wasn't there a feature flagging they put into cloudwatch. That was the weird one.
[00:48:03] Speaker D: Cloud Watch.
[00:48:03] Speaker A: Oh yeah, that was the one I did.
[00:48:05] Speaker D: And they released them within like three or four months of each other too which also.
[00:48:09] Speaker C: And then they killed the cloudwatch one though, like it only lasted a year. That one's already dead. So I don't think anyone used it because everyone was like why would you put that in CloudWatch?
That makes no sense.
But here we are anyways. Parameter store Google. Thanks. And apparently Matt says don't use Azure Key Vault.
[00:48:26] Speaker D: I mean it works for me.
I haven't gotten in trouble with my CFO yet, so we're good.
[00:48:31] Speaker C: Fair enough. The cross site interconnect is now generally available. Simplifying L2 connectivity between data centers using Google's global network infrastructure. Eliminating the need for complex multi vendor setups and reducing capital expenditures for WAN connectivity. The service offers consumption based pricing with no setup fees or long term commitments. Allowing customers to scale bandwidth dynamically and pay only for what they use. Though specific price details weren't provided in this announcement. Built on Google's 3.2 million kilometers of fiber and 34 subsea cables cross site interconnect provides 99.995 sorry three nines and a 5% SLA that includes protection against cable cuts and maintenance windows. The automatic failover and proactive monitoring across hundreds of cloud Interconnect pops, financial services and telecommunication providers are early adopters with Citadel reporting stable performance during their pilot program. Highlighting use cases for low latency trading, disaster recovery and dynamic bandwidth augmentation for AI ML workloads as transparent layer 2 service enables Maxec encryption between remote routers with customer control keys While providing programmable APIs for infrastructure as code workflows and real time monitoring for latency, packet loss and bandwidth utilizations.
[00:49:37] Speaker D: I'm just still impressed by the number of millions of kilometers of fiber.
[00:49:42] Speaker C: It's a lot of fiber.
[00:49:45] Speaker A: I mean I like this just because of, you know, the, the heavy use of infrastructure as code availability. Some of these deep down network services across the clouds really provide that. It's all just sort of click ops or if not service or like a support case. So this is kind of neat and I do like that that you can dynamically configure this and stand it up, tear it down pretty quickly. Although your mileage does vary with Google because if they, if you were to create the same name, I bet you money it will fail.
[00:50:21] Speaker D: I assume that they're able to give you the global aspect because the global VNET VPCs that they have, the way it's built out that way.
[00:50:27] Speaker A: Yeah.
Okay, so this is more like the equivalent of like a direct connect gateway in Amazon.
[00:50:34] Speaker D: Well, that's what I was thinking. It was direct connect but it's global which you know, normally your direct connects into a given region but then given that Google doesn't has the global model more than anything, I was trying to make sure I connected all those pieces in my head together.
[00:50:50] Speaker A: You can put multiple regions in a direct connect gateway.
I forget the details because it's been too long, but I do remember, I do remember defining a global architecture that only had three. Three gateways or direct gateways. Yeah.
[00:51:08] Speaker D: Anyway, but I thought you paid a boatload to back all over there at that point.
[00:51:14] Speaker A: Pricing? No, of course not.
[00:51:16] Speaker C: He doesn't deal with money.
[00:51:17] Speaker A: No, that's a leader thing.
[00:51:19] Speaker D: He's in security. He just spends it gotta.
[00:51:21] Speaker C: Yeah.
[00:51:24] Speaker D: I feel like security was in your veins well before you merged into the world.
[00:51:28] Speaker A: You can't put a price on security either. That's all.
[00:51:34] Speaker D: Right security person.
[00:51:37] Speaker C: I sure can put a price on it. And it's expensive and Justin does it all the time.
[00:51:42] Speaker D: Let me go show you the budget line item for you.
[00:51:44] Speaker A: Yeah, prove the risk.
[00:51:45] Speaker D: All right, but I just wanted the new tool to play with.
[00:51:52] Speaker C: Speaking of budget savings, BigTable is now introducing tiered storage that automatically moves data older than a configurable threshold from SSD to infrequent access storage, reducing storage costs by up to 85% while maintaining API compatibility and data accessibility through the same interface. The infrequent Access tier provides 540% more storage capacity per node compared to SSD only nodes, enabling customers to retain historical data for compliance and analytics without manual archiving or a separate system time series. Workloads for manufacturing, automotive or IoT benefit most with sensor data, EV, battery telemetry and factory equipment logs can keep recent data on SSD for real time operations, moving older data to cheaper storage automatically. Based on the H policy. Integration with BigTable SQL allows querying across both tiers and logical views enable controlled access to historical data for reporting without full table permissions. Simplifying data governance for large datasets. Currently in preview with pricing at approximately 0.26 or 2 and a half gig cents per gigabyte month for infrequent access storage, compared to 17 cents per gigabyte per month for SSD storage. So that's basically roughly 8x cheaper, representing significant savings for organizations storing hundreds of terabytes of historical operational data. That's a big savings. That's, it's huge.
[00:53:07] Speaker A: And it's, it's automatic, which is great. And this is my, this is my favorite part about these services.
And to illustrate that I'm still a cloud, cloud guy at heart. Whenever I'm in an application and I'm loading data and I go back like I want to see a year's data or whatever, and it takes that extra like 30 seconds to load, I actually get happy because I know what they're doing on the back end.
[00:53:29] Speaker D: Spending more money because you have to hit it. That infrequent access charge.
[00:53:33] Speaker A: Yeah, they're probably just referencing it from the colder storage. It doesn't necessarily mean it's probably more money.
[00:53:38] Speaker D: Well, no, frequent access has that charge per usage.
[00:53:42] Speaker A: Yeah, that's, that's not my problem. I'm, I'm a user in this case.
[00:53:46] Speaker D: C C Prior statements. Yeah, no, it's, it's great. It's about, it's amazing the amount of companies that don't and the amount of platforms that don't really have that tiered structure. Everyone's like, we need it now. And you know, but you don't. And you know, it's like the number one thing whenever I we set up an S3 bucket blob storage, whatever you want to call it, you know, Google Storage. Like you need a you need to just like day one when I set up anything, you set up that tiering because it just makes your life better in the future versus retroactively doing it and seeing a $300,000 bill that comes across because you set it up. So set it up at the beginning. You know you're getting those savings and you know you built it correct. So you're great to see a lot of other platforms adding that in. It just, you know, always bothers me. It's not done day one, but I get it.
[00:54:38] Speaker C: Google's giving us a new instance this week with the a 4x max instance powered by Nvidia GB300 NVL72 with 72 Blackwell Ultra GPUs and 36 Grace CPUs delivering 2x network band compared to the a 4x and 4x. Better LLM training performance versus a 3H100 base VMS system features 1.4 exaflops per NVL70 system and can scale to clusters twice as large as a 4x deployments. GKE now supports DRA net or dynamic Resource Allocation Kubernetes network driver in production, starting with a 4x max, providing topology aware scheduling of GPUs and RDMA network cards boost bus bandwidth for distributed AI workloads. This improves cost efficiency through better VM utilization by optimizing connectivity between RDMA devices and the GPUs. The GK Inference Gateway integrates with Nvidia Nemo guardrails to add safety controls for production AI deployments, preventing models from engaging and undesirable topics or responding to malicious prompts. The integration combines model aware routing and auto scaling with enterprise grade security features. Vertex AI Model Garden will support Nvidia Nemotron models as NIM Microservices, starting with llama Nemotron Super V15, allowing developers to deploy open weight models with granular control over machine types, regions and VPC security policies, and Vertex AI training now includes curated recipes built on the Nvidia Nemo framework. The A4X Max is available in preview through Google Cloud sales representatives and Leverages Cluster Director for lifecycle management, Topology aware placement and integration with Managed Lustre storage. Pricing details were not disclosed in the announcement.
That's a lot of cool hardware stuff that I do not understand and it makes me sort of miss my Hardware days.
[00:56:17] Speaker A: I was, you know, I'm always the wrong guy, like for personal stuff, just because I downsize everything and it's like, oh, I don't even have stuff that has fans anymore because it's like, I don't really need to like Raspberry PI or death. And so like, seeing I just rolled out my first Home Lab GPU server and it's tiny compared to this. Like really, really tiny. Microscopic.
[00:56:41] Speaker D: Yeah, I'm with you. I'm like, how do I do this Cheaper.
[00:56:44] Speaker A: Yeah.
[00:56:45] Speaker D: And this is where. How do I spend more money?
[00:56:48] Speaker A: Well, hopefully they're making more money, although I don't think anyone's really doing that all that well with AI yet.
Someone's banned too.
[00:56:58] Speaker C: Moving on to Azure, they also are giving us the GB300 envel 72 instances. They deployed the first production cluster with over 4600 of them, in fact.
Featuring the Blackwell Ultra GPU enabling AI model training in weeks instead of months and supporting models with hundreds of trillions of parameters, the indie GB300V6VMs deliver 1440 petaflops of FP4 performance per rack with 72 GPUs, 37 terabytes of fast memory and 130 terabytes per second NV link bandwidth specifically optimized for reasoning models, agentic AI and multimodal generative AI workloads. Azure implemented the 800 gigabits Nvidia Quantum X800 InfiniBand networking with full fat tree architecture and sharp acceleration, doubling effective bandwidth by performing computations and switch for improved large scale training efficiency. The infrastructure uses standalone heat exchanger units and new power distribution models to handle high density GPU clusters. With Microsoft planning to scale to hundreds of thousands of Black Hole Ultra GPUs across global data centers.
OpenAI and Microsoft are already using these clusters for frontier model development, with the platform becoming the standard for organizations requiring supercomputing scale AI infrastructure.
[00:58:05] Speaker A: A lot of hardware.
[00:58:07] Speaker C: Yeah.
[00:58:07] Speaker D: Lots of hard work.
[00:58:07] Speaker A: I mean, that last line to me is like companies looking for scale now. Companies with a boatload of money.
That's what this is for.
[00:58:17] Speaker D: It's when you're running out of the end of your Mac and you have to spend a lot of money. You tell your AI team to go spend some money.
[00:58:24] Speaker A: Yeah, I don't. You have to call an AI team to spend money. I don't think you have to.
I think that's culture.
[00:58:31] Speaker D: Sorry. You hire an AI team. How about that?
[00:58:35] Speaker C: All right, our next story is Azure databases for PostgreSQL now with high availability net can now scale and under 30 seconds compared to the previous 2 to 10 minute window, reducing downtime by over 90% for database scaling operations. The feature targets production workloads that require continuous availability during infrastructure changes, particularly benefiting E commerce platforms, financial services and SaaS. Apps that can afford extended maintenance windows. Oh, that's right, it cannot afford extended maintenance Windows. That doesn't make any sense.
This near zero downtime scaling works specifically with HA enabled postgres instances leveraging Azure's high availability architecture. Pricing remains unchanged from standard PostgreSQL rates. The reduced downtime translates to lower operational costs by minimizing revenue loss during scaling events and reducing the need for complex maintenance scheduling.
This positions it well against its competitors. So nice.
[00:59:23] Speaker D: I mean they've had this for forever on Azure SQL, you know, which is their Microsoft SQL platform. So you know, it doesn't surprise me. It more surprised me that this, this was already a 2 to 10 minute window to scale. Seems crazy for a production HA service. Like how are you handling that in the past?
[00:59:45] Speaker A: Yeah, I was trying to look at like Aurora for postgres and like I thought they could do it dynamically without any downtime for a while.
[00:59:52] Speaker D: Yeah, they do.
[00:59:53] Speaker A: So like, I don't know if this is like more of like the, you know, the Azure model of their managing your postgres servers.
[01:00:01] Speaker D: For you it's RDS under the, I mean it's equal equivalent of RDS. It just, you couldn't scale very well.
[01:00:11] Speaker C: OneLake is now bringing you APIs that support the Azure Blob storage and ADS ADLS API, allowing existing applications to connect to Microsoft's fabric unified data lake without code changes. Just swap endpoints to onelake. What could go wrong? This API compatibility eliminates migration barriers for organizations with existing Azure storage investments, enabling immediate, immediate use of tools like Azure storage explorer with OneLake while preserving existing scripts and workflows. The feature targets enterprises looking to consolidate data lakes without rewriting applications, particularly those using C sharp SDKs or requiring DFS operations for hierarchical data management. Microsoft provides an end to end guide demonstrating open mirroring to replicate on premise data to OneLake data delta tables, positioning this as a bridge between traditional storage and fabrics analytical ecosystem.
No pricing for the OneLake API access cost likely will follow standard fabric capacity pricing models which are impossible to calculate or predict.
[01:01:07] Speaker A: Yeah, that little kind of throwaway comment about like positioning this as a bridge between traditional storage and fabric caught my attention. So I was reading a little bit about this because it is sort of an interesting idea. Like are you, are applications just going to completely move away from traditional storage or you know, even accessing objects in buckets? So because this is sort of a move away from that. Not that I really understand fabric or how it works, but kind of a neat idea.
[01:01:38] Speaker C: Agreed.
And I have a cloud journey for you guys this week. So there was an interesting article in InfoWorld. Not a source of a lot of good news or things that I care about, but you know, it was good to see some things. This is 8 Platform Engineering Anti patterns and so basically for those of you out there who are trying to create platform engineering teams, there's, you know, several people sometimes will talk about, you know, hitting all these issues and how do you get people to adopt them and trough a disillusionment phase, et cetera.
And so basically InfoWorld's article basically says there's some anti patterns that you should try to avoid, to hopefully avoid some of the pain that you're now suffering. So if you already did these, sorry, but if you haven't, or you're beginning a journey, you know, these are great tips. And so the first one is building the front end first.
A big misconception in platform engineering is that the platform is the visual interface, I. E. That the platform and developer portal are one and the same.
Unfortunately, it's, it's more than that. Focus on building a solid backend with APIs and orchestration before adding your UI is the recommendation from Luca Galante, core contributor at Platform Engineering, a popular platform engineering community.
Front end user interfaces like Backstage are a core element of platform engineering, but they aren't the full picture and developers may prefer interacting outside the GUI portal through a CLI or an API, which is definitely what I much prefer to do. So how do you guys feel about that one?
[01:02:57] Speaker A: Well, I, yeah, I mean I, I couldn't agree more. I, I think that, you know, I've seen several platforms go off the rails here at exactly this level because like you can, you can stand up a Backstage server pretty quick and, and do that, but the, the real meat and the logic and all the value that a platform is going to give you is everything behind it. And so the tight integration with all of your existing tools.
So having that come from a UI driven workflow does not make sense to me. It should always be API driven and in a lot of cases it shouldn't really be anything other than like tight integrations and sort of declarative patterns of how, you know, services work together, like from your code repository to your deployment engine to what are your cloud APIs and just having stuff that's very declarative and very easy and then you can put a UI in front of it. It's much like API driven applications. Right. It's just easier to develop it that way around.
[01:04:00] Speaker D: It's also, you know, what Amazon did when they first launched aws, there were so many features that were only available in the AP and you might be, you might have been able to even see some of them in the ui, but even someone you couldn't even see, you know, they were the, the front end was an afterthought. I do think that as your platform develops over time, you will need to eventually add a front end to it. But if you're just standing it up, it potentially doesn't need to be something that you get day one, you know, maybe some, something that's there, but it doesn't need to be beautiful.
[01:04:32] Speaker A: It needs to have shiny pretty graph so that you can get the next set of features for it, fund it, because your executives will need new pretty graphs.
[01:04:39] Speaker D: Ooh, you stole. Where? I was going to make sure you were, you were, I thought you were going to lead the lead in there and I was going to just really drive it in with the executive column, but you went right for Justin.
[01:04:48] Speaker A: Oh yeah, now, now that AIs can write all my front ends like I, I'm much like. Oh yeah, no, totally. You get your pre graphs, no problem. Yeah, they're still terrible, but yeah, I love pre charts.
[01:05:00] Speaker C: It's fine. All right. The second anti pattern is lacking the product mindset. Adopting a product mindset has almost become a cliche tip in the world of platform engineering. But it's true. Not treating the platform as a product, as a surefire path towards minimal use. You know, even the best products don't sell themselves. So you need someone out there evangelizing driving requirements, getting feedback and understanding what the user base requires.
Internal advocacy, storytelling and sharing early adopter success stories is critical to getting people to buy into your platform vision.
[01:05:30] Speaker A: I see your platform engineering and I raise you every single internal thing ever should have.
Yes, this right, I, I run every one of my apps like this, you know, even though it's just me doing.
[01:05:44] Speaker C: Development, I'm say you don't listen to any feedback I give you on product.
[01:05:48] Speaker A: I listen to you, I just don't implement it.
[01:05:52] Speaker D: You might take a nap in the middle of you telling him about it, but that's a different story.
[01:05:56] Speaker C: Normally, normally he listens to me. He nods and goes, yeah, I'm not doing that.
[01:06:01] Speaker D: He tells you to his face. It's like a good product person.
[01:06:03] Speaker C: Yeah, exactly.
Most probably people don't tell you to your face. That's the problem. They tell you, yeah, yeah, we'll put it on the roadmap. Which means never.
[01:06:10] Speaker D: Never.
[01:06:11] Speaker A: Do you want me to develop a feature for one person? No, thank you.
[01:06:14] Speaker D: But with anything you're doing, you need to have a product mindset. If you're really going to be successful with it, you really do need to build it out and have that you sell it to people. Whether it's a platform that you're developing on or you know, a new process that you're working on internally in your company or anything else, like show the value, show why it's going to make people's lives easier and better.
You know, even as simple as something that's like a JIRA workflow, you still have to do this to make sure everybody's on the same page and you capture all the requirements and you kind of work through it. It's just if you're doing your job, for lack of a better term or you're doing most things like and you're not just coming in and punching the clock and being a worker be where all you're doing is just pumping out code every day, like you have to think about these things.
[01:07:01] Speaker A: I've been a part of like two major sort of products, you know, at a huge scale internally and one was treated like a product and one was not and one that wasn't treated like a product was a, you know, a containerization platform. And you know, like the feedback that we got upon like releasing this internally was like, oh, it's cool, I just have to reconstruct and redo my everything about my application to use it. And so like it was a huge miss and it was, you know, could have been the next kubernetes, who knows. But no, it, it wasn't.
But you know, the inverse of that is running as a product, developing features that people want and making sure that you're developing towards solving their real use case.
You get so many advantages out of that. Like not only are you building something that someone will use but that you get internal evangelists who will go and sell your product and drive adoption for you.
So it's a fantastic way to treat everything.
[01:08:02] Speaker C: Moving on to our third anti pattern top down mandates for new technologies can easily turn off developers. It does for Ryan for sure. Especially when they alter existing workflows. Without the ability to contribute and iterate the platform drifts from developer needs prompting workarounds. Give the team some ownership, don't push them to adopt. As a recommendation from Tom Barkin Benkler director of product management at Spotify, they're also the creator of Backstage.
One example is Soundtrack, a Spotify may plug in to validate new code within a continuous development process. When engineers can build and customize the platform, they're part of the process and more likely to use that particular platform. He says the shared owner model is proving successful with engineering with 100% of employees using the company's internal version of Backstage at Spotify. So again, give. If there's something I don't like about your thing and I can go to give you internal open source and I can contribute to it and make it work for me and work for everybody else, then I feel contributing to the success of the platform and that's what you want a platform to be. So this is a definitely a good one. Don't try to be restrictive. It's basically, you know, only the platform engineers can build the platform. Darn it. No, don't do that. Don't be that guy.
[01:09:04] Speaker A: Yeah, but it's, I mean it is true, right? Like if you dictate a technology, you know, like if, like in, you know, like you have to use harness for cicd. Right. If, if I got that sort of dictation of like we have to do it this way, you would rebel for.
[01:09:17] Speaker D: The sake of rebelling. If I said you had to use, if I said you had to use Terraform for infrastructure as code, you would say no, just because I told you to do something. Right? Correct. Let's be honest here.
[01:09:27] Speaker A: That is absolutely true. But if you give me the business deliverable or the problem to solve or, and make that my responsibility, then it's, you know, the technology.
Not only am I going to probably end up using the same technology, it's a recommendation, but it's, it's going to be better integrated because it's going to be thought through from that aspect of like, this is part of the solution and not just a thing that I'm working around. It really changes the way that you design everything. If you can come at it from that point of view and then, you know, if it's not the best option, it gives a venue for that conversation to be like, I know that you really had your eyes on this solution, whatever it was. Kubernetes. So much kubernetes.
But there's a better, there's a better option.
[01:10:11] Speaker D: I mean the other feature, the other real Thing on this is also letting customers bug, fix or add features themselves. If you have built a platform and you say here it is, go launch it. And if you want, you know, and here's the source code. You know, I've seen teams internally when you do something like that, say, oh cool, we need this feature, we need to modify this thing, do it in this way. Which wasn't something it's you as the platform engineer ever thought of. Or hey, we need this simple extra thing that would save us a lot of time for our thing that then gets fed back in. So now you're also empowering your end users to be your platform engineers.
[01:10:47] Speaker A: Absolutely.
[01:10:48] Speaker D: You just have to sanity check their code sometimes.
[01:10:50] Speaker A: I mean, Justin and I have long rallied against service dashboards. Right. Or. And because that's the problem is that there's always. If you don't define that in a way that allows other people to contribute, you're just gating everything through a single team and waiting for them to develop whatever you need versus owning your own destiny. A big, big advocate of internal open source for features like that and designing your system where it can be modular enough to adopt that without being a major problem.
[01:11:22] Speaker C: Number four, not serving your users. This is kind of similar to the product 1. Throwing out steam on this article. It's okay.
[01:11:27] Speaker D: Yeah.
[01:11:28] Speaker C: If you build something about knowing your target audience and how you can help them, you build something, you will not be implementing engineering efficiency at all.
They said. This is tricky because every organization is different with unique requirements for different user subsets.
The feeling of being heard and understood is very important.
Users are more subject to the portal once they know it's been built. After someone asks about their problem problems again, product is important.
[01:11:51] Speaker A: It's the same. Yeah, exactly the same problem we already talked about.
[01:11:56] Speaker C: All right, next up is tracking the wrong metrics. Everything you build should be measurable, says Dynatraces Grabner.
Grabner says the door metrics are lagging indicators from a platform engineering perspective. Similarly, a high percentage of onboard developers using the platform may be only a surface level indicator of success and not necessarily necessarily an accurate reflection of the ROI to the business. A successful platform should improve time to market, reduce costs and increase innovation. The starting basis is a proper ROI calculation you should have for building the platform out. Well, everyone's building platform because that's the hot new thing to do without any metrics. So appreciate that insight, but this is definitely a key thing. Look at what the metrics are to you're trying to measure and fix with your platform.
[01:12:38] Speaker A: Yeah, I mean this is the first one where I kind of disagree, but only kind of like, you know, they're I DORA metrics are, are more than just something that you track internally. Like there are standardization across an industry to basically score something that's very difficult to sort of score across a large volume. But yeah, and if you pick apart the individual DORA metrics, like maybe releasing 17 times a day doesn't really meet your business problem, right? Because you, you can, you can deploy a new Docker container 17 times a day, but you can't really make it live because it doesn't go through, you know, all the things that you need to launch a production app, like performance testing, any kind of QA review, you know, all those types of things. And so it's like you can solve one aspect of this problem and then not really introduce any value. And I've seen that happen a lot of times across platform development.
[01:13:30] Speaker C: Matt's got nothing.
[01:13:31] Speaker D: Matt had nothing. I agree.
[01:13:33] Speaker A: Perfect.
[01:13:34] Speaker C: All right, the next one is copy the platform and approaches of others. Just because Spotify, Expedium and American Airlines did it and did it well and they wrote a paper about, doesn't mean you should do it exact way they did it. That's what they're saying. You know, you understand what your business requirements are. You're getting the product mindset of this. You know, you need to think about what the return on investment is going to be leverage them for. I would say insights, but ne don't necessarily copy them verbatim because what they may need for CICD or for Kubernetes is not what you will need for CICD or Kubernetes.
[01:14:03] Speaker A: Definitely. A really good example of this is Etsy. Right? Etsy is also on the backstage bandwagon, but they took a very, well, I don't know about a very different approach, but they definitely took the learnings from Spotify and customized it to their own use case and their own workflows, which I thought was, you know, and they, and they talked about it and documented all that, all that out and it was very interesting to read their use of that.
[01:14:31] Speaker D: I mean, I also say build the right tool at the right place at the right time.
You know, you are not Google, you are not Facebook, you are not Spotify. You know, the odds are you are a medium sized business. You probably don't need everything that these massive enterprises have. You know, there's probably some trimmed down version that you really need for your team. While it's always fun to say I built, you know, XYZ and it can do all these things but in reality if you're company's never going to use them or it's overly complicated or whatever, like you're not going to gain the value out of it. So you know, it goes back to build the plot, you know, all the prior ones. Understand what your product is and everything else and sell your product and treat like a product but leverage what you what your teams need and build the pro the platform for what you need at that time and know that there's incremental stuff. So maybe you build for 10x where you are or 5x where you are now or 10x where you are now, but don't build for 100x of where you are where you know and you over engineer it to the point when the platform's not useful. And I feel like a lot of people will just try to straight up copy these massive enterprises Kubernetes, hey, I'm a 12 person company and I need a full kubernetes stack and they could do everything in the world. I'm going to run my own and do all this. Is that a really good use of your time? You know, maybe if you're 100,000 person company you have a team dedicated to it and sure it makes more sense but maybe at your size you don't need that level of backstage for a three person development team.
[01:16:06] Speaker A: Just saying.
[01:16:08] Speaker C: All right, the next one for us is over engineering on day one. A problem that all three of us suffer from, which I think I just.
[01:16:16] Speaker D: Talked about too without looking at what the next one was. Sorry.
[01:16:20] Speaker C: No worries. Some platform teams attempt to boil the ocean from day one instead harnesses Mishra recommends starting by streamlining basic concrete developer touchpoints and CI cd introducing additional complexity incrementally, which is a good path for any software project really in general, but definitely over engineering on day one. Don't get overly mess.
[01:16:40] Speaker A: Yeah, I've gotten a lot better in the last few years just learning new strategies on how I can build foundational stuff with room for future development.
[01:16:49] Speaker C: Right.
[01:16:50] Speaker A: Like it's, it's, it's designed up front for my over engineering because I'm going to do that and I, I make myself feel better by like oh, I can plug in modules here later and now I can focus on what that MVP value really is and getting a product out there, you know, so that I can show value much faster than before. Right. Like three years of development for an internal platform, you know, is, is hard for people that are to, to understand. Right. And so you can talk about it all you want, but when it takes that long, they're like, what can you do for me now? Right, we've got business problems now.
Maybe you're, you know, maybe your efforts are better spent elsewhere.
And so just, you know, sort of building your MVP not over engineering, but also making sure that you design something up front where it can be extendable is super important.
[01:17:39] Speaker D: Yeah, the modular based approach I think is important, you know, and I mean in any software development I think it is. But specifically when you're building a platform, start with literally the most MVP you can and know where you want to go and make sure you haven't architected yourself into a corner day one. But, you know, at least set it up in a way that is, you know, where it is modular. And I think a lot of people miss that fact of it.
[01:18:04] Speaker C: The next one not to do is don't rebrand your operations team.
[01:18:10] Speaker A: You know, Paul, it's DevOps all over again.
[01:18:12] Speaker C: Yeah. Paul Kennedy, co founder and chief operator office in Tasso, said, I've seen teams simply being renamed from operations or infrastructure teams to platform engineering teams with very little change or benefit to the organization. Yeah, I agree on this one. Platform engineering requires a culture shift and that shift shouldn't be underestimated. And again, you don't treat your operations team like a product team.
So don't treat your platform engineering team like it's just ops. Because it's not.
[01:18:40] Speaker A: Yeah, I mean it's, it's just a, such a recipe for disaster. You've already rebranded your operations teams to DevOps and then realized they were just doing operational stuff. So you rebranded them to SREs and then you realized that, you know, there wasn't anyone doing release engineering or managing any kind of, you know, efficiencies there.
So you rebranded again and then it's like you're going to do that for platform engineering. It just not really, you're dancing around problems and not really hiring or having the right talent in the right places.
[01:19:14] Speaker C: All right, and our final tip or anti pattern is that thinking that platform engineering is finished. Platform engineering is never done. So yeah, you can't, you can't just throw a tiger team at it and say you're going to build this one thing and then it's going to be our platform of the future.
Now you need to invest in it. And again, that product management mindset is key here to make sure that you have a roadmap and then you have continual iteration and will work. Today may not work in two years. From now, when your application is morphed into a, you know, from a monolith to a microservices behemoth on kubernetes, you'll need a different platform strategy to do that. And so if the platform's evolving as your product's evolving, that's your best scenario. And so building a platform engineering team from day one is super critical.
[01:19:57] Speaker A: Yeah, I mean that's, it's a, it's a bit of a gimme, but it's like technology is always changing. There's always new services and, you know, if you've built, you know, something that's very tightly coupled to a specific configuration and can't adapt and you're going to have problems because developer teams are going to move around and if you're trying to enforce, you know, standards or best practices, you know, via your, your platform, you're just going to get worked around eventually if you don't update it and meet your customers to where they are today.
[01:20:32] Speaker D: I mean, it goes back to really the. Was the first, second one, which is, it's a product treat like a product.
You know, be in that product mindset. Like, otherwise, you know, your end users, in this case, you know, your other engineering teams are going to be like, why would I use this? There's no innovation on it. You got to keep improving and everything else. Otherwise there's going to be another platform or another shiny object for the engineers to go look at.
[01:20:59] Speaker C: Well, that's it for this week in the cloud, you guys. Any other final thoughts on platform engineering before we move on or anything? They didn't mention as an anti pattern.
[01:21:08] Speaker A: Trying to think, I don't, I can't think of anything they didn't really cover or that we didn't talk through on the show. Like, I, it's just important to think about these things as more than just backstage, more than just whatever component or tool you're building around. It really is a focus on, you know, the value that you're getting, the efficiencies that you're offering the rest of the business.
And Matt's got nothing.
[01:21:32] Speaker C: Yeah, Matt's got nothing.
You should build a platform team, Matt. It's fun.
[01:21:36] Speaker D: It's the end of the day. It's been a long day. Azure threw me a few curveballs.
[01:21:41] Speaker C: All right, fair enough.
[01:21:41] Speaker A: Yeah.
[01:21:42] Speaker C: Well, we should wrap this up anyways. Yeah.
Ryan and I are gone next week and so Matt is. Has the, the task of hurting the cat known as Jonathan to record next week's episode. So if we, if he gives up. We'll. I'll be back in two weeks, but hopefully he can herd the cats.
[01:22:00] Speaker D: And yeah, I'm normally pretty good, but I can heard Ryan I think a little bit easier than Jonathan.
[01:22:05] Speaker C: Yeah, Jonathan, Jonathan's enigma. You know, he's there sometimes, so he.
[01:22:10] Speaker D: Made TCP talk and the prep calls. So you know we're in the right direction.
[01:22:16] Speaker C: Yeah, yeah. As long as you coordinate in advance, you should be fine.
Well, good luck to you, Matt. We will see you in two weeks. Ryan and I, we're going to go have fun killing brain cells elsewhere and we will return hopefully with Azure predictions. Ryan, we had to. We had to brainstorm on that. While we're why we're gone on our trip together. This is going to go.
[01:22:36] Speaker D: Might be the best way to actually predict. I'm just saying.
[01:22:38] Speaker C: Yeah, yeah, like I don't know, maybe we can brainstorm over cocktails. Ryan, about what Azure might possibly announce. I have no idea.
[01:22:48] Speaker D: You might want to also prep your AWS one while you're at it.
[01:22:51] Speaker C: Yeah, AWS going up too.
[01:22:53] Speaker A: Yeah, it is that time of year.
[01:22:54] Speaker C: It is that time of year. So yeah. We will also talk about our recording schedule because we are taking some time off at Thanksgiving, a couple other things as we do here at the end of the year and we wrap things up so we got lots of episodes coming, but we're also taking some very well needed time off for our co hosts and myself.
And so we will see you next week. Well, I won't, but Ryan and I won't but Matt and Jonathan Holy will see you next week here in the Cloud.
[01:23:19] Speaker D: Wish me luck.
[01:23:19] Speaker C: All right. Yeah, good luck.
[01:23:21] Speaker A: Bye, everybody.
[01:23:22] Speaker D: Bye, everyone.
[01:23:26] Speaker B: And that's all for this week in Cloud. We'd like to thank our sponsor, Archera. Be sure to click the link in our show notes to learn more about their services.
While you're at it, head over to our
[email protected] where you can subscribe to our newsletter, join our Slack community, send us your feedback and ask any questions you might have. Thanks for listening and we'll catch you on the next episode.