307: The AI Assistant That Finally Understands Your Kubernetes Cluster (We are Doomed)

Welcome to episode 307 of The Cloud Pod – where the forecast is always cloudy! Who else is at a conference? Justin is coming to us this week from sunny San Diego where he’s attending FinOps – so we have that news to look forward to for next week. Matt and Ryan are also on hand today to share the latest news from Kubernetes, Salesforce acquisitions, and the strange case of Azure making AWS more cost effective.

Titles we almost went with this week:

The Great Redis Escape: One Year Later, Valkey is Living Its Best Life
Cache Me If You Can: How Valkey Outran Redis’s License Policies
Tier Today, Gone Tomorrow: AWS’s New Storage Class That Moves Your Data So
You Don’t
Hey AI, Deploy My App: AWS Makes It Actually Work
AWS Finally Calculates What You’ll Actually Pay
The Price is Right: AWS Edition
From List Price to Real Price: AWS Gets Transparent
Red Hat and AWS Sitting in a Tree, R-H-E-L-I-N-G
Dockerfile? More Like Dockefile-It-For-Me with Amazon’s New MCP Server
Elementary, My Dear Watson: Amazon Q Becomes Sherlock Holmes for AWS
CUD You Believe It? Red Hat Gets the Discount Treatment
Committed Relationship Status: It’s Complicated (But 20% Cheaper)
RHEL Yeah! Google Drops Prices on Enterprise Linux
Disk Today, Gone Tomorrow: Azure’s Vanishing OS Storage
ATL1: Where GPUs Meet Sweet Tea and Southern Hospitality
AWS Launches Operation Cloud Sovereignty
The Great Firewall of Europe: AWS Edition
Amazon Builds a GDPR Fortress in Germany

General News

01:46 What Salesforce’s $8B acquisition of Informatica means for enterprise data and AI | VentureBeat

Salesforce just dropped $8 billion to acquire Informatica.
This purchase was really about building the data foundation needed for agentic AI to actually work in enterprise environments – we’re talking about combining Informatica’s 30 years of data management expertise with Salesforce’s cloud platform to create what they’re calling a “unified architecture for agentic AI.”
This acquisition fills a massive gap in Salesforce’s data management capabilities, bringing in critical pieces like data cataloging, integration, governance, quality controls, and master data management – all the unsexy but absolutely essential plumbing that makes AI agents trustworthy and scalable in real enterprise deployments.
The timing here is fascinating, because Informatica literally just announced their own agentic AI offerings last week at Informatica World, so Salesforce is essentially buying a company that’s already pivoted hard into the AI space – rather than trying to build these capabilities from scratch.
There’s going to be some interesting overlap with MuleSoft, which Salesforce bought for $6.5 billion back in 2018, but analysts are saying Informatica’s data management capabilities are more comprehensive and updated – this could mean some consolidation challenges ahead as they figure out how to integrate these overlapping technologies.
For enterprise customers, this could be a game-changer because it promises to automate those painful, time-consuming data processes that typically take days or weeks. These AI agents can handle data ingestion, integration, and pipeline orchestration with minimal human intervention.
The $8 billion price tag is actually lower than the rumored $11 billion bid from last year, which might indicate either tough negotiations or perhaps some concerns about integration challenges. Remember, Salesforce has already spent over $50 billion on acquisitions including Slack, Tableau, and MuleSoft.

02:56 Justin – “Just keep your hands off slack, okay guys? That’s all I care about.”

Cloud Tools

05:13 Gomomento: Valkey Turns One How The Community Fork Left Redis In The Dust

Valkey has officially hit its one-year milestone as the community-driven fork of Redis, and it’s fascinating to see how quickly it’s gained traction after Redis Labs switched to a more restrictive license in March 2023.
The Linux Foundation stepped in to support this open-source alternative, and major players like AWS, Google Cloud, and Oracle have all thrown their weight behind it, essentially creating a unified response to Redis’s licensing changes.
What’s really impressive about Valkey is how it’s maintained complete compatibility with Redis while actually pushing innovation forward – they’ve already released version 8.0 with features like improved memory efficiency and better performance for large-scale deployments.
This shows the community isn’t just maintaining a fork, they’re actively improving upon the original codebase.
For developers and engineers, the practical impact is that you can continue using all your existing Redis tooling and client libraries without any changes, but now you have the peace of mind that comes with a truly open-source solution backed by the Linux Foundation. No more worrying about future licensing surprises or restrictions on how you can use your in-memory data store.
The performance improvements in Valkey 8.0 are particularly noteworthy – they’ve managed to reduce memory overhead by up to 20% for certain workloads while maintaining the same blazing-fast performance Redis users expect. This is crucial for companies running large-scale caching layers where even small efficiency gains can translate to significant cost savings.
Looking ahead, Valkey’s roadmap includes some exciting features like native support for vector similarity search and improved clustering capabilities, which suggests they’re not just playing catch-up but actually positioning themselves to lead in the in-memory database space.
The irony here is that Redis’s attempt to monetize through licensing restrictions may have actually accelerated innovation in the space by spurring the creation of a well-funded, community-driven alternative that’s now pushing the entire ecosystem forward faster than before.

06:37 Ryan – “I haven’t seen a lot of talk of Redis recently and every new greenfield application that I’ve seen or worked around now is looking at Valkey or using Valkey actively. So I feel like this is going to go the same way as Elasticsearch and the licensing change there where it just won’t be the go-to option anymore.”

07:59 The Harness MCP Server

Harness just released their MCP Server, which implements the Model Context Protocol – an open standard that lets AI agents like Claude Desktop, Windsurf, or Cursor securely connect to your Harness workflows without writing custom APIs or brittle glue code, essentially turning Harness into a plug-and-play backend for AI agents.
This addresses a major pain point where customers are excited about AI but struggle with giving their agents secure access to delivery data from pipelines, environments, and logs.
The MCP Server acts as a lightweight local gateway that translates between AI tools and the Harness platform while maintaining enterprise-grade security controls.
What’s clever here is that Harness is dogfooding their own solution – they’re using the same MCP server internally that they’re offering to customers, which means it’s battle-tested and provides consistency across different AI agents and environments without the maintenance headache of multiple adapters.
The security story is particularly strong – it uses JSON-RPC 2.0 for communication, integrates with Harness’s existing RBAC model, handles API keys directly in the platform, and ensures no sensitive data ever gets sent to the LLM, which should make security teams much more comfortable with AI integrations.
From a practical standpoint, this enables some interesting use cases like customer success engineers using AI to instantly check release statuses without bothering the dev team, or building Slack bots that alert on failed builds and surface logs with minimal setup time.

10:31 Justin – “The key success of being able to build a successful MCP though is to have APIs. So if you were already behind on getting to APIs, I think this is the struggle for you. Now you’re doubly behind – because you’re not only behind on the API spec, but you’re also behind on the MCP part as well.”

12:12 Hashicorp: Terraform Adds New Pre Written Sentinel Policies

HashiCorp has released a collection of pre-written Sentinel policies that automatically enforce AWS Foundational Security Best Practices within Terraform workflows, essentially giving teams a ready-made security guardrail system that prevents common misconfigurations before infrastructure gets deployed. This is huge for organizations struggling to balance developer velocity with security compliance requirements.
These policies cover critical security controls like ensuring S3 buckets aren’t publicly accessible, requiring encryption for EBS volumes and RDS instances, and enforcing proper IAM configurations – basically all those security checks that teams know they should implement but often get overlooked in the rush to ship features. The beauty is that these policies run during the plan phase, catching issues before any resources are actually created.
What’s particularly clever about this release is how it addresses the skills gap problem. Not every organization has security experts who can write complex policy-as-code rules, so having HashiCorp provide battle-tested policies out of the box dramatically lowers the barrier to entry for implementing proper cloud security governance.
Teams can literally copy-paste these policies into their Terraform Cloud or Enterprise setup and immediately start benefiting.
The timing of this release is perfect given the increasing focus on supply chain security and infrastructure compliance, with regulations getting stricter and breach costs rising, having automated policy enforcement that aligns with AWS’s own security recommendations gives organizations a defensible security posture they can point to during audits.
Plus, it shifts security left in the development process without requiring developers to become security experts overnight.

AWS

17:00 Amazon FSx for Lustre launches new storage class with the lowest-cost and only fully elastic Lustre ﬁle storage

Amazon just launched FSx for Lustre Intelligent-Tiering, which is essentially the first fully elastic Lustre file storage in the cloud – meaning it automatically grows and shrinks as you add or delete data, so you’re only paying for what you actually use instead of over provisioning storage like you would on-premises, and at less than $0.005 per GB-month, it’s claiming to be the lowest-cost high-performance file storage option available.
This is a game-changer for HPC workloads like seismic imaging, weather forecasting, and genomics analysis that generate petabytes of data – the service automatically moves your data between three tiers (Frequent Access, Infrequent Access after 30 days, and Archive after 90 days), potentially reducing storage costs by up to 96% compared to other managed Lustre options without any manual intervention.
For AI/ML teams trying to maximize their expensive GPU utilization, this is particularly interesting because it delivers up to 34% better price performance than on-premises HDD file systems, and with Elastic Fabric Adapter and GPU Direct Storage support, you’re getting up to 12x higher per-client throughput compared to previous FSx for Lustre systems.
The tiering is completely transparent to applications – whether your data is in the Frequent Access tier or has been moved to Archive, you can still retrieve it instantly in milliseconds, which means you can migrate existing HDD or mixed HDD/SSD workloads without any application changes.
The service is launching in 15 AWS regions including major hubs in North America, Europe, and Asia Pacific, and the pricing model is consumption-based – you pay for the data and metadata you store, operations when you write or read non-cached data, plus your provisioned throughput capacity, metadata IOPS, and SSD cache size.

18:28 Justin – “I imagine this is truly fantastic for people who have workloads where they’re getting the performance increase out of Lustre. So that’s pretty rad that it’s automatic. It feels a little strange that you can retrieve it at the same speed, but at different costs; I would just force everything to the lower tier, but I imagine you don’t have that option.”

19:45 Enhance AI-assisted development with Amazon ECS, Amazon EKS and AWS Serverless MCP server | AWS News Blog

AWS is bringing AI-powered development assistance to the next level with new Model Context Protocol servers for ECS, EKS, and Serverless, which essentially give your AI coding assistants like Amazon Q Developer real-time, contextual knowledge about your specific AWS environment instead of relying on outdated documentation.
Imagine having an AI that actually knows your current cluster configuration and can help you deploy containers in minutes using natural language commands.
The real game-changer here is that these MCP servers bridge the gap between what LLMs know from their training data and what’s actually happening in your AWS account right now, so when you ask your AI assistant to help deploy an application, it can configure load balancers, networking, auto-scaling, and monitoring with current best practices rather than generic advice from two years ago.
What’s particularly impressive is how these tools handle the entire development lifecycle – in the demo, they showed creating a serverless video analysis application using Amazon Nova models, then migrating it to containers on ECS, and finally deploying a web app on EKS, all through natural language prompts in the command line without writing deployment scripts or YAML files.
The troubleshooting capabilities are where this really shines for DevOps teams – when deployments fail, the MCP servers can automatically fetch logs, identify issues, and even fix configuration problems, turning what used to be hours of debugging into a conversational problem-solving session with your AI assistant.
This fits perfectly into AWS’s broader AI strategy by making their services more accessible to developers who might not be container or Kubernetes experts, essentially democratizing cloud deployment by letting you say “deploy this app to EKS and make it scalable” instead of learning the intricacies of Kubernetes manifests and AWS networking.

21:58 Ryan – “I want it to completely shield me from learning Kubernetes. I’ll never know it now – I’m just gonna ask the robot to do it.”

22:13 AWS Pricing Calculator, now generally available, supports discounts and purchase commitment – AWS

In news we’ve been waiting FOREVER for, AWS finally brings their Pricing Calculator into the console as a generally available feature, and it’s about time – this tool now lets you create cost estimates that actually reflect what you’ll pay after applying your existing discounts and commitments like Savings Plans or Reserved Instances, which is a game-changer for financial planning.
The big innovation here is that you can now import your historical usage data directly into the calculator to create estimates based on real-world patterns, or build estimates from scratch for new workloads – and it gives you three different rate configurations to see costs before discounts, after AWS pricing discounts, and after both discounts AND your purchase commitments are applied.
This is particularly valuable for enterprises doing their annual budget planning or preparing for board presentations because you can finally show realistic cost projections that account for your negotiated Enterprise Discount Programs and existing Reserved Instance coverage, rather than just list prices that nobody actually pays.
The ability to export estimates in both CSV and JSON formats with resource-level detail is a subtle but important feature that’ll make FinOps teams happy – you can now integrate these estimates directly into your internal financial planning tools or build automated workflows around cost modeling.
What’s interesting is that AWS is positioning this as both a workload estimator AND a full AWS bill estimator, which suggests they’re trying to help customers understand not just what a new project will cost, but how it impacts their overall AWS spend when layered onto existing infrastructure.
For organizations considering multi-year commitments or trying to optimize their Savings Plans strategy, this tool becomes essential because you can now model different commitment scenarios and see the actual impact on your bottom line before pulling the trigger on those purchases.
The fact that this is available in all commercial regions (except China) means most AWS customers can start using it immediately – and given that it’s free to use, there’s really no excuse not to be doing more sophisticated cost modeling for your AWS workloads.

23:58 Ryan – “I hope it’s not something terrible where you have to feed it all your discount data and your code usage.”

24:30 Announcing Red Hat Enterprise Linux for AWS

Red Hat is finally bringing RHEL 10 to AWS with deep native integration, marking a significant shift from just running RHEL on EC2 instances to having a purpose-built, AWS-optimized version that includes pre-tuned performance profiles and built-in CloudWatch telemetry right out of the box.
This isn’t just another Linux distro in the AWS Marketplace – they’ve baked in AWS CLI, optimized networking with Elastic Network Adapter support, and created AWS-specific performance profiles, which means enterprises can skip a lot of the manual optimization work they typically do when deploying RHEL workloads.
This comes as organizations are looking to standardize their Linux deployments across hybrid environments, and having RHEL with native AWS integration could simplify migrations for shops that are already heavy Red Hat users on-premises.
One of the more innovative aspects is the inclusion of “image mode using container-native tooling,” which suggests Red Hat is bringing their edge computing and immutable OS concepts from RHEL for Edge into the cloud, potentially making updates and rollbacks much cleaner.
While the announcement mentions flexible procurement options through EC2 Console and AWS Marketplace, the real question will be pricing – traditionally RHEL has commanded a premium, and it’ll be interesting to see if the AWS-optimized version carries additional costs beyond standard RHEL subscriptions.
This is available across all AWS regions including GovCloud, which signals that AWS and Red Hat are serious about capturing government and compliance-heavy workloads that have traditionally relied on RHEL’s security certifications and long-term support guarantees.

24:58 Justin – “Let’s be honest – no one does the manual optimization work.”

26:21 Introducing agentic capabilities for Amazon Q Developer Chat in the AWS Management Console and chat applications – AWS

Amazon Q Developer just got a major upgrade with new agentic capabilities that essentially turn it into your personal AWS troubleshooting detective – it can now break down complex problems into steps, consult multiple AWS services, and piece together answers from across your entire infrastructure without you having to manually dig through logs and configurations.
This is a game-changer for DevOps teams because instead of asking simple questions like “What’s an S3 bucket?”, you can now ask something like “Why is my payment processing Lambda throwing 500 errors?” and Q will automatically check CloudWatch logs, examine IAM permissions, investigate connected services like API Gateway and DynamoDB, and even look at recent changes to figure out what’s going wrong.
The multi-step reasoning capability is the real innovation here – Amazon Q now shows its work as it investigates your problem, asking for clarification when needed and explaining its reasoning process, which not only helps solve the immediate issue but also helps engineers understand their systems better and learn troubleshooting patterns.
What’s particularly impressive is that this works across 200+ AWS services through their APIs, meaning Q can pull together information from virtually any part of your AWS infrastructure to answer questions, making it incredibly powerful for organizations with complex, multi-service architectures.
The integration with Microsoft Teams and Slack is brilliant for enterprise teams because it brings this troubleshooting power directly into where engineers are already working and collaborating, eliminating the context switching between chat apps and the AWS console during incident response.

27:35 Ryan – “And, if you add in instructions for your agent to respond in a snarky and sort of condescending way, you really have automated me out of a job.”

**Show note editor note: Welcome to my world, Ryan.**

28:59 AWS cooks up Euro cloud outfit to soothe sovereignty nerves • The Register

AWS is launching a European Sovereign Cloud by the end of 2025, creating a legally independent entity based in Germany with EU-only staff, infrastructure, and leadership – essentially building a firewall between European customer data and potential US government reach under laws like the Cloud Act.
This move directly responds to growing European anxiety about data sovereignty, especially with the Trump 2.0 administration’s aggressive foreign policy stance, and follows similar announcements from Microsoft and Google Cloud who are also scrambling to address European concerns about US tech dependence.
AWS is creating a completely autonomous infrastructure with its own Route 53 DNS service using only European top-level domains, a dedicated European Certificate Authority, and the ability to operate indefinitely even if completely disconnected from AWS’s global infrastructure.
What’s really interesting is the governance structure – they’re establishing an independent advisory board with four EU citizens, including at least one person not affiliated with Amazon, who are legally obligated to act in the best interest of the European Sovereign Cloud rather than AWS corporate.
The timing couldn’t be more critical as European politicians are increasingly vocal about reducing dependence on US tech, especially after Microsoft reportedly blocked ICC prosecutor access to email in compliance with US sanctions, which really spooked EU officials about their vulnerability.
For AWS customers in Europe, this means they’ll finally have an option that addresses regulatory compliance concerns while maintaining AWS’s service quality, though it remains to be seen how pricing will compare to standard AWS regions and whether the Cloud Act truly has no reach here.
The bigger picture shows how geopolitical tensions are literally reshaping cloud infrastructure – we’re moving from a globally interconnected cloud to regional sovereign clouds, which could fundamentally change how multinational companies architect their systems.
While AWS promises “no critical dependencies on non-EU infrastructure,” the parent company remains American-owned, so there’s still debate about whether this truly protects against Cloud Act requirements – it’s a legal gray area that will likely need court testing to resolve.

GCP

37:07 Get committed use discounts for RHEL | Google Cloud Blog

Google Cloud is bringing committed use discounts to Red Hat Enterprise Linux, offering up to 20% savings for customers running predictable RHEL workloads on Compute Engine – this is a big deal for enterprises who’ve been paying full on-demand prices for their RHEL subscriptions in the cloud.
The way these RHEL CUDs work is pretty straightforward – you commit to a one-year term for a specific number of RHEL subscriptions in a particular region and project, and in exchange you get that 20% discount off the standard on-demand pricing, which really adds up when you’re running enterprise workloads 24/7.
What’s interesting here is Google’s positioning compared to AWS and Azure – while both competitors offer various discount mechanisms for compute resources, Google is specifically targeting the RHEL subscription costs themselves, which is a significant expense for many enterprises running traditional workloads in the cloud.
The sweet spot for these discounts kicks in when you’re utilizing RHEL instances about 80% or more of the time over the year, which honestly describes most production enterprise workloads – Google’s research shows the majority of RHEL VMs run 24/7, so this pricing model actually aligns well with real-world usage patterns.
One thing to watch out for is that these commitments are completely inflexible – once you purchase them, you can’t edit or cancel, and you’re on the hook for the monthly fees regardless of actual usage, so you really need to nail your capacity planning before pulling the trigger.

38:22 Justin – “So if I’m committing to the license, but I can move it between any type of instance class, I actually am okay with that – and if that’s something we’re going to see for other operating systems in the future, where maybe Windows has a discount if I’m willing to commit and things like that, this could be an interesting move by Google in general.”

39:11 Launching our new state-of-the-art Vertex AI Ranking API | Google Cloud

Blog

Google just launched their Vertex AI Ranking API, which is essentially a precision filter that sits on top of your existing search or RAG systems to dramatically improve result relevance – they’re claiming it can help businesses avoid that scary 82% customer loss rate when users can’t find what they need quickly, and it addresses the fact that up to 70% of retrieved passages in traditional search often don’t contain the actual answer you’re looking for.
Google is positioning this as a drop-in enhancement rather than a rip-and-replace solution – you can keep your existing search infrastructure and just add this API as a reranking layer, which means companies can get state-of-the-art semantic search capabilities in minutes instead of going through months of migration, and they’re offering two models: a default one for accuracy and a fast one for latency-critical applications.
The performance benchmarks are pretty impressive – Google’s claiming their semantic-ranker-default-004 model leads the industry in accuracy on the BEIR dataset compared to other standalone reranking services, and they’re backing this up by publishing their evaluation scripts on GitHub for reproducibility, plus they say it’s at least 2x faster than competitive reranking APIs at any scale.
This feels like Google’s answer to the reranking capabilities we’ve seen from players like Cohere and their Rerank API, but Google’s bringing some unique advantages with their 200k token context window for long documents and native integrations across their ecosystem – you can use it directly in AlloyDB with a simple SQL function, integrate it with RAG Engine, or even use it with Elasticsearch, which shows they’re thinking beyond just their own stack.

40:13 Justin – “Basically this is their answer to Cohere and Elasticsearch.”

41:02 Project Shield blocked a massive recent DDoS attack. Here’s how. | Google

Cloud Blog

Google’s Project Shield just proved its worth by defending KrebsOnSecurity against a staggering 6.3 terabits per second DDoS attack – that’s roughly 63,000 times faster than average US broadband and one of the largest attacks ever recorded, showing that even free services can provide enterprise-grade protection when backed by Google’s infrastructure.
Project Shield is completely free for eligible organizations like news publishers, government election sites, and human rights defenders. It’s essentially Google weaponizing their massive global infrastructure for good, letting at-risk organizations piggyback on the same defenses that protect Google’s own services.
The technical stack behind Project Shield is impressive – it combines Cloud Load Balancing, Cloud CDN, and Cloud Armor to create a multi-layered defense that blocked this attack instantly without any manual intervention, filtering 585 million packets per second at the network edge before they could even reach the application layer.
This is a great example of how cloud providers are differentiating beyond just compute and storage – while AWS has Shield and Azure has DDoS Protection, Google’s approach of offering this as a free service to vulnerable organizations shows they’re thinking about cloud infrastructure as a force for protecting free speech and democracy online.
For regular GCP customers, this attack validates Google’s DDoS protection capabilities – the same technologies protecting KrebsOnSecurity through Project Shield are available to any Google Cloud customer, with features like Adaptive Protection using machine learning to dynamically adjust rate limits in real-time.
The simplicity of implementation is noteworthy – organizations just change their DNS settings to point to Project Shield’s IP addresses and configure their hosting server info, making it easy to enable or disable protection with a simple DNS switch, which is crucial for organizations that might not have dedicated security teams.
This incident highlights the escalating DDoS threat landscape – attacks have grown from the 620 Gbps Mirai botnet attack in 2016 to this 6.3 Tbps monster in 2024, a 10x increase that shows why organizations need to think seriously about DDoS protection as attacks become more sophisticated and volumetric.

44:07 Cloud Run GPUs are now generally available | Google Cloud Blog

Google just made GPU computing truly serverless with Cloud Run GPUs going GA, and the killer feature here is that you only pay for what you use down to the second.
Imagine spinning up an NVIDIA L4 GPU for AI inference, having it automatically scale to zero when idle, and only paying for the actual seconds of compute time, which is a game-changer compared to keeping GPU instances running 24/7 on traditional cloud infrastructure.
The cold start performance is genuinely impressive – they’re showing sub-5 second startup times to get a GPU instance with drivers installed and ready to go, and in their demo they achieved time-to-first-token of about 19 seconds for a Gemma 3 4B model including everything from cold start to model loading to inference, which makes this viable for real-time AI applications that need to scale dynamically.
What’s really clever is how they’ve removed the traditional barriers to GPU access – there’s no quota request required for L4 GPUs anymore, you literally just add –gpu 1 to your command line or check a box in the console, making this as accessible as regular Cloud Run deployments, which democratizes GPU computing for developers who previously couldn’t justify the complexity or cost.
The multi-regional deployment story is strong with GPUs available in five regions including US, Europe, and Asia, and you can deploy across multiple regions with a single command for global low-latency inference – they showed deploying Ollama across three continents in one go, which would be a nightmare to set up with traditional GPU infrastructure.
At Next ’25 they demonstrated scaling from 0 to 100 GPU instances in just 4 minutes running Stable Diffusion, which really showcases the elasticity – this kind of burst scaling would cost a fortune with reserved GPU instances but makes perfect sense with per-second billing for handling viral AI applications or unpredictable workloads.
Early customers like Wayfair are reporting 85% cost reductions by combining L4 GPU performance with Cloud Run’s auto-scaling, while companies like Midjourney are using it to process millions of images – the combination of reasonable GPU pricing with true scale-to-zero capabilities seems to be hitting a sweet spot for AI workloads that don’t need constant GPU availability.

45:49 Ryan – “Anything that scales down to zero is ok in my book.”

46:50 GKE Volume Populator streamlines AI/Ml data transfers | Google Cloud Blog

Google just released GKE Volume Populator, and this is actually a pretty clever solution to a real pain point in AI/ML workflows – basically, if you’re storing your training data or model weights in Cloud Storage but need to move them to faster storage like Hyperdisk ML for better performance, you previously had to build custom scripts and workflows to orchestrate all those data transfers, but now GKE handles it automatically through the standard Kubernetes PersistentVolumeClaim API.
What’s really interesting here is that Google is leveraging the Kubernetes Volume Populator feature that went GA in Kubernetes 1.33, but they’re adding their own special sauce with native Cloud Storage integration and fine-grained namespace-level access controls – this means you can have different teams or projects with their own isolated access to specific Cloud Storage buckets without having to manage complex IAM policies across your entire cluster.
The timing on this is perfect for AI/ML workloads because one of the biggest challenges teams face is efficiently loading massive model weights – Abridge AI reported they saw up to 76% faster model loading speeds and reduced pod initialization times by using Hyperdisk ML with this feature, which is huge when you’re dealing with large language models that can be hundreds of gigabytes.
From a cost optimization perspective, this is actually quite smart because your expensive GPU and TPU resources aren’t sitting idle waiting for data to transfer – the pods are blocked from scheduling until the data transfer completes, so you can use those accelerators for other workloads in the meantime, which could save significant money on compute costs.

Azure

49:44 New AI innovations that are redefining the future for software companies | Microsoft Azure Blog

Microsoft is making a big push to turn every software developer into an AI developer with Azure AI Foundry, their new unified platform that brings together models, tools, and services for building AI apps and agents at scale.
What’s really interesting here is they’re positioning this as the shift from AI assistants that wait for instructions, to autonomous agents that can actually be workplace teammates.
The Azure AI Foundry Agent Service is now generally available, and it lets developers orchestrate multi-agent workflows where AI agents can work together to solve complex problems.
This is Microsoft’s answer to the growing demand for agentic AI that can automate decision-making and complex business processes, which AWS and GCP haven’t quite matched yet in terms of a unified platform approach.
Microsoft is seriously expanding their model catalog with some heavy hitters – they’ve got Grok 3 from xAI available today, Sora from OpenAI coming soon in preview, and over 10,000 open-source models from Hugging Face, all with full fine-tuning support, which gives developers way more choice than what you typically see in competing cloud platforms.
The real game-changer here might be what they’re calling “Agentic DevOps” – GitHub Copilot is evolving from just helping you write code to actually doing code reviews, writing tests, fixing bugs, and even handling app modernization tasks that used to take months but can now be done in hours, which could fundamentally change how software teams operate.
They’ve introduced a Site Reliability Engineering agent that monitors production systems 24/7 and can autonomously troubleshoot issues as they arise across Kubernetes, App Service, serverless, and databases – essentially giving every developer access to the same expertise that powers Azure at global scale, which is a pretty compelling value proposition for teams that can’t afford dedicated SRE staff.
For startups and ISVs, Microsoft is sweetening the deal with flexible Azure credits through Microsoft for Startups, and they’re reporting that AI and machine learning offer revenue in their marketplace grew 100% last year – companies like Neo4j have seen 6X revenue growth in 18 months through the marketplace, which shows there’s real money to be made here.

53:13 Ryan – “The way I hope AI rolls out is that it does stuff like this, but then it still requires supervision – the SRE engineers, the DevOps engineers that you already have – are now freed up to do more impactful things. So maybe it’s refining prompts for these agents, giving them those constraints by, you know, thinking about how they basically operate and all those like things that aren’t written down as intangibles and really getting that executed into prompts.”

54:05 Announcing dotnet run app.cs – A simpler way to start with C# and .NET 10 – .NET Blog

Microsoft just made getting started with C# dramatically easier with .NET 10 Preview 4 by introducing the ability to run a single C# file directly using `dotnet run app.cs`, eliminating the need for project files or complex folder structures – essentially bringing Python-like simplicity to C# development while maintaining the full power of the .NET ecosystem.
This new file-based approach introduces clever directives that let you reference NuGet packages, specify SDKs, and set MSBuild properties right within your C# file using simple syntax like `#:package [email protected]`, making it perfect for quick scripts, learning scenarios, or testing code snippets without the overhead of creating a full project structure.
What’s particularly brilliant about this implementation is that it’s not a separate dialect or limited version of C# – you’re writing the exact same code with the same compiler, and when your script grows beyond a simple file, you can seamlessly convert it to a full project using `dotnet project convert app.cs`, which automatically scaffolds the proper project structure and translates all your directives.
The feature even supports Unix-style shebang lines, allowing you to create executable C# scripts that run directly from the command line on Linux and macOS, positioning C# as a viable alternative to Python or Bash for automation scripts and CLI utilities – imagine writing your cloud automation scripts in strongly-typed C# instead of wrestling with shell scripts.
This addresses a long-standing pain point where developers had to rely on third-party tools like dotnet-script or CS-Script to achieve similar functionality, but now it’s built right into the core .NET CLI, requiring no additional installations or configurations beyond having .NET 10 Preview 4 installed.
The timing is perfect as more cloud platforms and services provide .NET SDKs, allowing developers to quickly prototype API integrations, test cloud service connections, or build automation scripts without the ceremony of setting up a full project – you could literally test an Azure Storage connection in a single file and run it immediately.
Visual Studio Code support is already available through the pre-release version of the C# extension, with IntelliSense for the new directives, and Microsoft is exploring multi-file support and performance improvements for future previews, suggesting this feature will only get more powerful as .NET 10 approaches release.
This democratizes C# development in a way that makes it accessible to beginners while still being useful for experienced developers who want to quickly test ideas or build utilities, effectively positioning C# as both a powerful enterprise language and a convenient scripting language in one package.

56:20 Ryan – “I’m very mixed on this, because it’s like, .NET development; the development patterns I see are already so detached from the running environment, so I feel like this is a further abstraction on top of all the leveraged libraries and frameworks that are part of .NET.”

57:45 Announcing General Availability: Ephemeral OS Disk support for v6 Azure

VMs | Microsoft Community Hub

Microsoft just made ephemeral OS disks generally available for their latest v6 VM series, and this is a big deal for anyone running stateless workloads because you’re getting up to 10X better OS disk performance by using local NVMe storage instead of remote Azure Storage – essentially eliminating network latency for your operating system disk operations.
The beauty of ephemeral disks is that they’re perfect for scale-out scenarios like containerized microservices, batch processing jobs, or CI/CD build agents where you don’t need persistent OS state – you can reimage a VM in seconds and get back to a clean state, which is fantastic for auto-scaling scenarios where you’re constantly spinning up and tearing down instances.
This puts Azure in a really competitive position against AWS’s instance store volumes and GCP’s local SSDs, though Microsoft’s implementation is particularly interesting because it specifically targets the OS disk placement on NVMe storage while still allowing you to use regular managed disks for your data volumes if needed.
The v6 VM series that support this feature – like the Dadsv6 and Ddsv6 families – are already Azure’s latest generation with AMD EPYC processors, so you’re combining cutting-edge CPU performance with blazing-fast local storage, making these ideal for performance-sensitive workloads that can tolerate the ephemeral nature of the OS disk.
From a cost perspective, ephemeral OS disks are essentially free since you’re not paying for managed disk storage – you’re just using the local storage that comes with your VM, which could lead to significant savings for large-scale deployments where you might have hundreds or thousands of VMs that don’t need persistent OS disks.
One thing to keep in mind is that these disks are truly ephemeral – if your VM gets deallocated or moved to different hardware for maintenance, you lose everything on that OS disk, so this isn’t for everyone – you really need to architect your applications to be stateless and store any important data elsewhere.
The deployment is surprisingly straightforward with just a few extra parameters in your ARM templates or CLI commands, and the fact that it works with marketplace images, custom images, and Azure Compute Gallery images means you can pretty much use it with any existing VM deployment pipeline you already have.
For DevOps teams and platform engineers, this feature is particularly exciting because it enables faster VM boot times, quicker scale-out operations, and better performance for temporary workloads like build agents or test environments where persistence is actually a liability rather than an asset.

1:03:22 Generally Available: Support for AWS Bedrock API in AI Gateway Capabilities in Azure API Management

Announcing expanded support for AWS Bedrock model endpoints across all Generative AI policies in Azure API Management’s AI Gateway.
This release enables you to apply advanced management and optimization features such as Token Limit Policy, Token Metric Policy, and Semantic Caching Policy to AWS Bedrock models, empowering you to seamlessly manage and optimize your multi-cloud AI workloads.
Key benefits include:
- Apply token limiting, tracking, and logging to AWS Bedrock APIs for better control
- Enable semantic caching to enhance performance and response times for Bedrock models.
- Achieve unified observability and governance across multi-cloud AI endpoints.

1:04:06 Justin – “Azure, we thank you for making AWS more cost effective and responsive with your capabilities and features.”

Other Clouds

1:07:20 Introducing ATL1: DigitalOcean’s new AI-optimized data center in Atlanta

DigitalOcean is making a serious play for the AI infrastructure market with their new ATL1 data center in Atlanta, which is their largest facility to date with 9 megawatts of total power capacity across two data halls.
It’s specifically designed for high-density GPU deployments that AI and machine learning workloads demand.
This marks a significant shift in DigitalOcean’s strategy from being primarily known as a developer-friendly cloud provider for smaller workloads to now competing in the GPU infrastructure space, deploying over 300 GPUs including top-tier NVIDIA H200 and AMD Instinct MI300X clusters in just the first data hall.
The timing of this expansion is particularly interesting as we’re seeing massive demand for GPU resources driven by the AI boom, and DigitalOcean is positioning themselves as a more accessible alternative to the hyperscalers for startups and growing tech companies that need GPU compute but don’t want the complexity or cost structure of AWS, Azure, or GCP.
By choosing Atlanta as their location and partnering with Flexential for the facility, DigitalOcean is strategically serving the Southern U.S. market where there’s been significant tech growth, offering lower latency for regional customers while maintaining their promise of simplicity and cost-effectiveness that made them popular with developers in the first place.
The integration of GPU infrastructure alongside their existing services like Droplets, Kubernetes, and managed databases creates an interesting one-stop-shop proposition for companies building AI applications, allowing them to keep their entire stack within DigitalOcean’s ecosystem rather than mixing providers.
With a second data hall planned for 2025 with even more GPU capacity, this represents a multi-year commitment to AI infrastructure, suggesting DigitalOcean sees this as core to their future rather than just riding the current AI hype wave.
This expansion brings DigitalOcean to 16 data centers across 10 global regions, which while still small compared to the hyperscalers, shows they’re serious about geographic distribution and reducing latency for their growing customer base.

Closing

And that is the week in the cloud! Visit our website, the home of the Cloud Pod where you can join our newsletter, slack team, send feedback or ask questions at theCloud Pod.net or tweet at us with hashtag #theCloudPod

[00:00:00] Speaker A: Foreign. [00:00:06] Speaker B: Welcome to the Cloud pod where the forecast is always cloudy. We talk weekly about all things aws, GCP and Azure. [00:00:14] Speaker C: We are your hosts, Justin, Jonathan, Ryan and Matthew. [00:00:18] Speaker A: Episode 307, recorded for June 3, 2025. The AI assistant that finally understands her Kubernetes cluster. We are doomed. [00:00:27] Speaker D: Doomed. [00:00:28] Speaker C: Oh man. I mean, if anything, it's going to understand Kubernetes. It's going to be AI, and that can't be good. [00:00:35] Speaker D: The question is, is Gemini going to know it better than the rest of them? [00:00:39] Speaker A: I mean, it might or might not. I don't know. [00:00:42] Speaker C: Yeah, I mean, Q didn't seem to understand AWS when it first came out and I definitely asked Gemini some Google questions that I got just horribly wrong. [00:00:53] Speaker A: So I don't know, is that before Gemini 2.5 or. [00:00:56] Speaker C: It was. It was, yeah, Gemini and I haven't tested Q since it came out, but yeah, no, it was Gemini 1. [00:01:05] Speaker A: Yeah, I mean the early Gemini was definitely rough on cloud questions. Well, I'm down here in beautiful San Diego at the FinOps X conference. We'll cover that next week as well as the AWS or sorry, the Snowflake Summit, which is happening in San Francisco at the same time. Apparently we're in the thick of conference season because everyone's got conferences they're going to everywhere. But definitely first day here. Good so far, but I'll share my commentary on the conference for when I get back. But nice to see all my finops friends down here. Saw Joe Daly, saw Rob Martin down here yesterday and today, and many others as well. So exciting times. But we can get into the news here folks. First off, general news. Salesforce is buying $8 billion of Informatica and this is really about building the data foundation needed for agentic AI to actually work in enterprise environments. So they're taking Informatica's 30 year old database technology and data management expertise and combining with Salesforce Cloud to create what they're calling a unified architecture for GenTech AI. The acquisition fills a massive gap in Salesforce's data management capabilities, bringing critical pieces like data cataloging, integration, governance, quality controls and master data management. All the unsexy but absolutely essential plumbing that makes AI agents trustworthy and scalable in real enterprise deployments. It's interesting here at the timing because it's just announced their own agentic AI offering, or Salesforce did, sorry, Informatica did last week at Informatica World. And so Salesforce essentially buying a company that's already pivoted hard into AI space rather than trying to build these capabilities from scratch, which they've already been building their own capabilities as well. So I don't know. I'm going to try to say they've been building it from scratch. There'll be some interesting overlap with Mulesoft, which Salesforce bought for six and a half billion dollars back in 2018. But analysts are saying Informatica's data management categories are more comprehensive and updated. And this can mean some consolidation challenges ahead as they figure out how to integrate these overlapping technologies. Just keep your hands off slack, okay, guys? That's all I care about. [00:02:59] Speaker C: Yeah, I mean, considering someone. You guys had to explain to me what Informatica was while we were going over the notes. I'm curious to see, like, what sectors does this use? It's $8 billion. It's not small. [00:03:12] Speaker A: Well, I mean, apparently it's down. There apparently was a rumored $11 billion bid last year that's not wheedled down to $8 billion. So I think it's a very successful company that's shrinking in size. Based on what I understood about them and some conversations I had with them a few years ago. Informatica as a database is definitely not leading the charge anywhere as a database technology. So the other things they built out around data management, big data, some of those technologies are really what the value's at. And then, yeah, they're under tremendous pressure from Snowflake and databricks, et cetera. So the question will be, can Snowflake, sorry, can Snowflake, can Salesforce take advantage of this in a way that'll get them to be successful? And there was some conversation in the analysts talk about this, allow them to help going after Snowflake for some of the data management, data governance, policy stuff. So it'd be interesting to see if they can, you know, pivot this into their own version of Snowflake for Salesforce. [00:04:08] Speaker C: Interesting. Okay, so it's more, it's closer to, say, Snowflake than it is like Oracle. [00:04:15] Speaker A: Yeah. I mean, the database is where it started, but at this point it's really moved into big data and to larger parts of technology. [00:04:22] Speaker D: It's a 32 year old company. I feel like it's mainly like data warehousing at this point. Like so redshift or one of those. More of that, I think. [00:04:29] Speaker C: Okay. Yep. [00:04:31] Speaker D: Yeah. I was surprised when I was. When I saw the announcement, I looked it up, I was like, I know this company. I looked it up. I was like, it's 32 years old. I'm pulling you up right now to remind me, but it's like 1999 they IPO'd. This company's been around and seen a ton over the years. [00:04:45] Speaker A: Yeah, they're actually based out of San Diego here I think too. Not too far from where I'm at for the FinOps Foundation. But yeah, they've been around for a very long time. [00:04:56] Speaker C: Might see some new boats then. Like $8 billion maybe. [00:05:01] Speaker A: We'll see. All right, moving on to Cloud Tools Valky turns one Basically there's a great article here by gomenvento. He basically talks about how the community forklift Redis in the dust, basically pointing out that it's quickly grained traction. After Redis Lab switched to more restrictive licensing in March 2023, Linux foundation stepped in to support the open source alternative, and major players like aws, Google Cloud and Oracle have all thrown their weight behind it, essentially creating a unified response to Redis licensing changes. One of the more impressive things about it has been that it's maintaining complete compatibility with Redis while actually pushing innovation forward. They've already released version 8 with features like improved memory efficiency and better performance for large scale deployments. This shows that the community isn't just maintaining a fork, they're actively improving upon the original code base. And for developers and engineers, the practical impact is that you can continue use all your existing Redis tooling and client libraries without any changes. But now you have to have peace of mind that you come up with a truly open source solution backed by the Linux foundation and no more worrying about future licensing surprises or restrictions on how you can use your in memory data store in the future. Those performance improvements were about 20% increase for certain workloads while maintaining the same blazing fast performance rate as users have always expected. This is crucial for companies running large scale caching layers. Even small efficiency gains can run into significant cost savings. The roadmap for Valkey has some exciting features ahead, including native support for vector similarity searching and improved cluster capabilities, which suggests that they're not just playing catch up, but actually positioning ourselves a lead in the in memory database space. [00:06:26] Speaker C: Yeah, I haven't seen a lot of talk about this recently and every new greenfield application that I've seen or worked around now is really is looking at Valky or using Valkey actively. So I feel like this is sort of. I think they, you know this is going to go the same way as elasticsearch and the licensing change there, where it just won't be the go to option anymore. [00:06:51] Speaker A: I mean, I was working on my bot. I was asking Claude for some refactoring advice and it was like well, maybe a caching layer would make sense. And it was said it actually recommended Valky to me because I'm on aws. It wasn't even say redis. So I was like wow, even the AIs were going from Valky and so interesting choice. I love the we're hearing the docile tones of lullabies in the background from Matt's baby going to sleep. So our listeners will love it. I'm sure. [00:07:24] Speaker D: It on and off. Sorry. [00:07:26] Speaker A: No, it's just my good news. [00:07:28] Speaker D: The microphone's really good. [00:07:29] Speaker A: The microphone's great. It's very soothing. [00:07:32] Speaker D: It's what you told me again, so I listened. Apparently it's too good. [00:07:36] Speaker A: Apparently at that tone, that tone range, it picks it up perfect. Autumn Servers will love that Harness is releasing an MCP server which implements the Model Convex Protocol, an open standard that let AI agents like Cloud, Desktop, Windsurf or Cursor securely connect to your Harness workflows without writing custom APIs or brutal glue code, essentially turning Harness into a plug and play backend for AI agents. This addresses the major pain point where customers are excited about AI but struggle with giving their agents secure access to delivery data for pipelines, environments and logs. And MCP Server acts as a lightweight local gateway that translates between AI tools and the Harness platform while maintaining enterprise grade security controls. What's clever here is that Harness is dogfooding their own solution and they're using the same MCP server internally that they're offering to their customers, which means it's battle tested and provides consistency across different AI agents and environments that maintenance headaches of multiple adapters. Security for this is it's using the JSON RPC 2.0 for communication, integrating with Harness to the existing RBAC model, handles API keys directly in the platform, ensures no sensitive data ever gets sent to the LLM, which would make security teams much more comfortable with AI integrations. And from an overall standpoint, this enables some interesting use cases like Customer success engineers using AI to instantly check release statuses without bothering the dev team, or building Slack bots that will run failed builds and surface logs with minimal setup time. [00:08:56] Speaker C: Yeah, this is pretty neat. Like, I mean I'm all into the gentic flows right now and doing several POCs, testing building agents and now that you know MCP is around, it's much easier to get AI to do things, which is what I'm most, you know, like interested in. Right. Try to make Automate My daily life somehow. And so these types of MC server offerings are great, right? I love it. [00:09:26] Speaker D: Yeah. I mean, I feel like the, you know, MCPS is going to be the new API where when I been writing it even I've been using everything else. I've been telling to hit the APIs and now I have to change my thought process. Say, hey, hit the MCP server for Hashicorp, hit the MCP server for Harness, and it's going to be, for me, at least it's a change in thought of like, hey, write me a script that hit, that hits an API to leverage the MCP server that's going to be more native to it. [00:09:55] Speaker A: Yeah. [00:09:55] Speaker D: So it's a little bit of going to be a change in mind for the end user too when you're leveraging these things. Because I don't always think, hey, let me go check. Especially now since they're all new, let me go check to see if Service A has an mtp. But I'd say in the next six months if you don't. If a company doesn't have mcp, they're falling behind quickly. [00:10:13] Speaker A: Yeah, well, I mean the key success of being able to build a successful MCP though, APIs to have APIs. So if you were already behind on getting to APIs, I think this is the struggle for you. Now you're doubly behind because you're not only behind on the API spec, but you're also now behind on the MCP part as well, right? [00:10:30] Speaker C: Yeah, yeah. I don't think APIs are going to go away because it's, you know, the interface still needs to be there for applications to talk to one another. This is really just something you can put for the AIs to talk to. [00:10:43] Speaker A: I think if you try to make MZP the default for all communication, you would end up having a pretty slow experience. And So I think APIs are still significantly faster for data sets that you understand and needs. But if you have very basic English conversational conversations you want to have, MCP works out really well for you. In those use cases or in the case of a Google Docs, the MCP interface is much better than the APIs because it actually understands what you're trying to do from a document perspective. But those use cases are going to be different than others. So again, I think there's a human use case that's involved. I think MCP makes sense that if it's a computer to computer system, I think you're going to be looking more at APIs. But then you'll have kind of the overlay of both of them with agent to agent does. [00:11:28] Speaker D: I mean I made this assumption. I'm wondering now if it's false. I assume the MCP server just hit APIs under the hood. Most of them, yeah. [00:11:36] Speaker A: It does mostly hit as. Yeah, I mean it adds understanding and context from the LLM, but yeah, it's using APIs to do most of its access in the backend. Azure Corp is releasing a collection of pre written sentinel policies that automatically enforce AWS foundational security best practices within Terraform workflows. Essentially giving teams a ready made security guardrail system that prevents common misconfigurations where infrastructure gets deployed. A huge organization struggling to balance developer velocity with security compliance requirements. These policies cover critical security controls like ensuring S3 buckets aren't publicly accessible, requiring encryption for EVS volumes and RDS instances, and enforcing proper IAM configurations. Basically all those security checks that your teams know they should implement but often get overlooked in the rush to ship the feature. The beauty is that these policies can run during the plan phase, catching issues before any resources are actually created. It's also good that this addresses the skills gap problem has security experts who can write complex policies code rules. So having Hashing Corp provide battle tested policies out of the box dramatically lowers the barrier to entry for implementing proper cloud security governance. Teams can literally copy paste these policies into their terraform cloud or enterprise setup and immediately start benefiting. [00:12:43] Speaker D: I think we're going to start a lottery pool or like the center box and bingo for battle tested. I feel like that's the new buzzword in like the last like couple months. I feel like it's just come out strong. This is a battle tested this and that, you know. [00:12:57] Speaker A: Yeah, Hatchet Corp's definitely been hitting it pretty hard on their blog posts. I've noticed it might be an AI thing too, using AI to help clean it up and it's using that stuff but. [00:13:05] Speaker C: Oh, I wonder. That'd be funny. Oh, the uniformity starts. [00:13:10] Speaker D: No, but I've started to hear it even in day to day conversations with like third party vendors and things like that. This is our battle tested X, Y and Z. And you're like great, so you've properly tested it. Anyway, we've digressed down. I mean this is great because you're, you know, for lack of better buzz to buzzword bingo as we're already on that conversation, this is shifting everything left so it's getting it in the hands of the developers because I've done these things a Lot with config and other tools and policy and Azure policies where you put these in place, which is good later on, but you just get your end users errors on the buckets or on the storage accounts when they try to get created. So here at least it gives them it before it even tries to run so before it hits the API. So it's a great thing, honestly, for a service like Sentinel that's been out for. I don't want to try to figure out how many years at this point Sentinel's been ga'd. This feels like something they should have had years ago for people. [00:14:05] Speaker C: Yeah, I'm really surprised. I. You know, how many times has this been written in how many companies? You know, is crazy to me. And having, you know, the direct interaction with the person committing the code is super important, you know, and so much better experience than having, you know, a backend cloud tool block things and. Or getting permission denied or what have you. So this is great. [00:14:30] Speaker A: Or backend Ryan yelling at you about, you know, doing something unsecure. So there. [00:14:35] Speaker C: Well, there is no. Nothing better than me coming and yelling at you for your insecure infrastructure. I continue to maintain that that's a benefit that I offer. Not, Not a curse. [00:14:47] Speaker A: Depends on which side of that conversation you're on. But sure, okay. But yeah, I think being able to do it at the time of Terraform Apply prevents the problem from even coming out in the beginning, which is the most important part because yeah, every dev is like, oh, I'm already off that project. We shift to production, we're done. It's like, yeah, yeah, but you just introduced a security problem. And so now either you have to escalate on them or you have to pull away capacity for other things. And people get super cranky about those type of things. [00:15:15] Speaker C: And if you do the, you know, the output's right. It's a teaching tool. Right? [00:15:19] Speaker D: Like, right. And that's the thing. And I, I don't know what this is, but it shouldn't just say, you know, S3 error. That's not gonna be useful. It needs to say Bucket is public. Try to do XYZ like it. It. I don't know, I haven't looked too closely. But like, it needs to be a useful message. We're not like blocked. [00:15:39] Speaker C: The Sentinel syntax kind of leads itself to. To having that sort of post message, much like you would set up in like Amazon config or something, where it's like it. It tells you what's wrong in code form and then kind of gives you a plain language definition. [00:15:55] Speaker D: So you haven't written enough custom policies in config where you just tell things, automatically delete and then people get mad at you because over here they're like I deployed it. Here it is that you're like, oh well, it got deleted. And you have to obviously find that in the config logs that I fired and that whole mess that I've definitely had to deal with a couple times in my life. That's why I don't like automatic deletions of that. It just caused problems. [00:16:21] Speaker A: All right, moving on to aws, Amazon is launching FSX for Lustre Intelligent Tiering, which is essentially the first fully elastic Lustre file storage in the cloud, meaning it automatically grows and shrinks as you add or delete data. So you're only paying for you actually use instead of over provision of storage like you would on premise and at less than 0.005 cents per gigaby per month, is claiming to be the lowest cost high performance storage option available to you. This is apparently a game changer for HPC workloads with seismic imaging, weather forecasting and genomics analysis that generate petabytes of data. And the service automatically moves your data between three tiers frequent access, infrequent access after 30 days and archive after 90 days, potentially reducing your storage cost by up to 96% compared to other managed lister options for AI ML teams trying to maximize their expensive GPU utilization. This is particularly interesting because it Delivers up to 34% Better price performance than on premise HDD file systems. With Elastic fabric adapter and GPU direct storage support, you're getting up 12 times higher per client throughput compared to previous FSX for Lustre systems. The TERA is completely transparent to applications where either your data is in the frequent access tier or has been moved to archive, you can still retrieve it instantly in milliseconds, which means you can migrate existing hard drive or mixed hard drive SSD workloads without any application changes. The service is launching in 15 AWS regions, including major hubs in North America, Europe and Asia Pacific, and the pricing model is consumption based, paying for the data and metadata you store operations when you write or read non cache data plus your provision throughput capacity, metadata, IOPS and SSD cache sizes. [00:17:51] Speaker C: I mean, I imagine this is truly fantastic for for people who have, you know, workloads where they're getting the performance increase at Illustre, so that's pretty rad that it's automatic. Feels a little strange that you can retrieve it at the same speed and but at different costs and like I would just force everything to the lower tier. But I imagine you don't have that option. [00:18:16] Speaker A: I imagine there's some of a quota involved in it. Like yeah, you can, you can get that I access from archive for up to a certain amount of burst capacity simply. I've seen other models of this done. [00:18:27] Speaker D: Yeah, I assume it's like the, for lack of a better example, the T series like it will give you some level of burst sacrifice the infrequent access and less of a burst to archive access. [00:18:39] Speaker C: It's kind of weird to apply that model to storage tiers, but. [00:18:45] Speaker A: I mean if it worked everywhere else, why not have been on storage? [00:18:47] Speaker C: I guess. [00:18:47] Speaker A: Yeah. I guess is the answer. Yeah, yeah. I just double checking in the article it does still it does say at milliseconds even for archive. So yeah. All right. You can enhance your AI system development with ecs, EKS and AWS serverless MCP servers bringing Amazon is bringing AI powered development assistance to the next level with new model context protocol servers for ecs, EKS and Serverless, which essentially gives your AI coding assistance like Amazon Q Developer real time contextual knowledge about your specific AWS environment. Instead of relying on outdated documentation, imagine having an AI that actually knows your current cluster configuration and can help you deploy containers in minutes using natural language commands. The real game changer here is that these MTV servers bridge the gap between what LLMs know from their training data and what's actually happening in your AWS account right now. So when you ask your AI assistant to help deploy an application, it configured load balancers, networking, auto scaling and monitoring with current best practices rather than generic advice from two years ago. What's particularly impressive is how these tools handled the entire development lifecycle. In the demo they showed creating a serverless video analysis application using Amazon Nova models, then migrating to containers on ecs, and finally deploying a web app on EKS all through natural language prompts in the command line without writing deployment scripts or YAML files. I think it was last week or two weeks ago I said that MCP is the new click ops, so to be careful. Yep, the troubleshooting capabilities are where this really shines for DevOps teams. When deployments fail, the MCP server can automatically fetch logs, identify issues, and even fix configuration problems, turning what used to be hours of debugging into a conversational problem solving session with your AI assistant. Very nice. [00:20:20] Speaker C: Yeah, it is funny, you know, like the repeatability is going to be a thing like I love how easy these things are and you know I always think about you know they have you know, their the demos, their workflows for deployment and actual useful things. But I want demos for how I'll actually use it. Like where did I deploy that container and then have it go look across all my Amazon accounts and clusters. [00:20:41] Speaker D: When you have the 15 million EKS clusters that they had last week as the overview of the EKS across all your accounts. Maybe this will work cross count and tell you where it's located. [00:20:50] Speaker C: Yeah, I hope so. I hope it works cross count. How frustrating would that be? [00:20:56] Speaker A: I mean I just hope it can help me understand kubernetes really the key the key outcome. I hope out of this because or bring order to the chaos. [00:21:05] Speaker C: Yeah I want it to completely shield me from learning kubernetes. [00:21:09] Speaker A: Yes, exactly. [00:21:10] Speaker C: I'll never know it. Now I'm just going to ask the robot to do it. Exactly. [00:21:15] Speaker A: AWS Pricing Calculator has received the feature we've all been asking for forever. They're finally bringing to their pricing calculator into the console as a generally available feature that this tool now lets you create cost estimates that actually reflect what you'll pay after applying your existing discounts and commitments like savings plans or reserve instances, which is a game changer for financial planning. The big thing here is that you can now import your historical usage data directly into the calculator to create estimates based on real world patterns or build estimates from scratch for new workloads and gives you three different rate configurations to see cost before discounts, after AWS price discounts, and after both discounts and your purchase agreements are applied. It's particularly valuable for enterprises doing their annual budget planning or preparing for board presentations because you can finally show realistic cost reductions that account for your negotiated enterprise discount program and existing reserve instance coverage, rather than just list prices that nobody actually pays. The ability to export estimates in both CSV and JSON formats with resource level detail is subtle, but an important feature that makes FinOps teams happy and you can now integrate these events directly into your internal financial planning tools or build automated workflows around your cost modeling. Thank God. [00:22:21] Speaker C: Yeah like long overdue feature like this is the amount of clued jobs I've done taking pricing calculator data and trying to turn it into air quotes. Real data is crazy like so many bad Excel formulas. So this is I'm great to see it just have has it in a tool natively. I hope this is so a supported pattern within the API for cost management as well. This is fantastic. [00:22:50] Speaker A: Can all the cloud providers get this right? [00:22:54] Speaker C: I hope it's not like something terrible where you have to feed it all your discount data and your code usage or something. [00:22:59] Speaker A: No, I mean it uses your actual bills to inform the, you know, to basically build a small model. I'm sure we just have a rag in the background to do a lot of it. [00:23:07] Speaker C: I hope so. I mean that's, that would be fantastic because then you're, then, you know, you're getting, you know, specific data for your workload. That is great. As you know, it's better than a table lookup, which is what I had envisioned. [00:23:22] Speaker A: Red Hat is finally bringing Red Hat 10 to AWS with deep native integration, marking a significant shift from just running red hat on EC2 instances to having a purpose built AWS optimized version that includes pre tuned performance profiles and built in CloudWatch telemetry right out of the box. Basically they baked in the AWS CLI optimized networking with Elastic Network adapter support and created AWS specific performance profiles. Which means enterprise can skip a lot of the manual optimization work they typically do when deploying Red Hat workloads. Let's be honest, no one does the manual optimization work. Basically, this comes at an interesting time where organizations are looking to standardize their Linux deployments across hybrid environments. And having Red Hat with native AWS integration could simplify migrations for shops that are already heavy Red Hat users on premise. One of those innovative aspects in this inclusion of image mode using container native tooling, which suggests Red Hat is bringing their edge computing and immutable OS concepts from Red Hat for Edge into the cloud, potentially making updates and rollbacks much cleaner in the future. The announcement does mention flexible procurement options through the EC2 console and AWS marketplace. The real question will be pricing. Traditionally, Red Hat has commanded a premium and it'll be interesting to see if AWS optimized version carries additional cost beyond standard Red Hat Enterprise Linux inscriptions. We'll keep an eye on that for you here at the Cloud podcast. [00:24:34] Speaker C: Yeah, you know, can you bring your own license? Like that's. Yeah, it's an important thing, right? Because it's. I know in some, I think some software you can't. So it's. You're just paying the premium on top of it and you can't work your negotiated license if you have it in there. But it's, it's interesting that it's got that it mentions both the EC2 console and marketplace separately, and I'm guessing that's for private marketplace deals. Guessing? Yeah, I assume so. It's kind of like bringing your own license. [00:25:02] Speaker A: Amazon Q developer has gotten a major upgrade with new agenda capabilities that essentially turn into your personal AWS troubleshooting detective. You can now break down complex problems into steps, consult multiple AWS services and piece together answers from across your entire infrastructure without you having to manually dig through logs and configurations. This could be a game changer for your DevOps teams because instead of asking simple questions like what's an S3 bucket? You can now ask something like why is my payment processing lambda throwing a 500 error? And Q will automatically check CloudWatch logs, examine IAM permissions, investigate connected services like API Gateway and DynamoDB, and even look at recent changes to figure out what's going wrong. The multi step reasoning capability is the real innovation here and Amazon Q now shows its works as it investigates their problem, asking for clarification when needed and explaining its reasoning process, which not only helps solve the media issue, but also helps engineers understand their systems better and learn troubleshooting patterns. This now covers 200/ AWS services through their APIs, meaning Q can pull together from virtually any part of your AWS infrastructure to answer questions, making it incredibly powerful for organizations with complex multi service architectures. It also integrates with Microsoft Teams and Slack to allow you to basically ask these questions directly in a interface that you're used to, or allow you to do it in your chat app or chatbot capabilities. [00:26:17] Speaker C: And if you add in like instructions for your agents to respond in a snarky and sort of condescending way, you really have automated me out of a job. It's awesome. [00:26:26] Speaker D: I thought you were going to say you automated the cloud pod from existing. Now the question is can you use this combined with the prior announcement of the Sentinel to then say why is this failing? And why is you told me to create the S3 bucket and put the two things against each other and somehow tie Sentinel. Does Sentinel have an MCP yet or is it coming? Maybe have Sentinel talk to Q to have them figure out what your issue because it told you to build an. [00:26:57] Speaker C: S3 bucket public basic as a society we can't help it to but to pair the the robots against each other. It was the first thing that everyone did when like the Alexas and the Google home devices came out is I'm. [00:27:09] Speaker A: Like talking to for for a funzi project I created a a chatbot that basically has like you know, mimics people like myself or Ryan and it argues with itself which is super fun it. [00:27:24] Speaker D: Is really fun to be fair. That's what I do on my daily basis too. I want to do this, but this isn't secure enough. How do I want to handle that? Go back and forth. [00:27:40] Speaker B: There are a lot of cloud cost management tools out there, but only Archera provides cloud commitment insurance. It sounds fancy, but it's really simple. Archera gives you the cost savings of a one or three year AWS savings plan with a commitment as short as 30 days. If you don't use all the cloud resources you've committed to, they will literally put the money back in your bank account to cover the difference. Other cost management tools may say they offer commitment insurance, but remember to ask will you actually give me my money back? Our chair A will click the link in the show notes to check them out on the AWS Marketplace. [00:28:19] Speaker C: Well, following. [00:28:20] Speaker A: In the footsteps of Google and Azure, AWS is also launching a European sovereign cloud by the end of 2025, creating a legally independent entity based in Germany with EU only staff, infrastructure and leadership, essentially building a firewall between Europe customers, customers, data and potential US Government reach under laws like the Cloud Act. This move directly responds to growing European anxiety about data sovereignty, especially within the Trump 2.0 administration aggressive foreign policy stance, and follows similar announcements from Microsoft and Google, who are also scrambling to address European concerns. Nintendo architecture is fascinating. AWS is creating a completely autonomous infrastructure with its own Route 53 DNS service using only European top level domains, a dedicated European Certificate authority and the ability to operate indefinitely even if completely disconnected from AWS's global infrastructure. The government structure is establishing an independent advisory board with four EU citizens, including at least one person not affiliated with Amazon, who are legally obligated to act in the best interest of the European sovereign cloud rather than AWS corporate. The timing is critical as European politicians are increasingly vocal about reducing the dependence on US tech, especially after Microsoft reportedly blocked ICC prosecutor access to email in compliance with the US sanction, which really spooked EU officials about their vulnerabilities. For AWS customers in Europe, this means they'll finally have an option that addresses regulatory compliance concerns while maintaining AWS service quality, though it remains to be seen how pricing will compare to standard AWS regions and whether the Cloud act truly has no reach there. The bigger picture shows how geopolitical tensions are literally reshaping cloud infrastructure or moving from a globally interconnected cloud to regional sovereign clouds, which could fundamentally change how multinational companies architect their systems. [00:29:57] Speaker C: Yeah, I can't. I get why data sovereignty is a concern I and I, I just really hate that it's, it is so tied to our current politics, political situation. Not just in the US Either. Like there's, there's a lot of conservative, you know, governments going up when. And there, there's a lot of isolation that's going to happen because of it. And so it sucks. [00:30:23] Speaker A: Well, and you know, some of the regimes like the Kingdom of Saudi Arabia are driving, you know, some of these policies and some of these changes too. You know, like they're requiring this data sovereignty for their data. And you know, there are big companies there who are some of the largest in the world and they can dictate a lot of policy to a lot of foreign companies too. So it is definitely interesting times. [00:30:44] Speaker C: And if I'm a, you know, an app provider and I basically have to produce 20 different versions of my application to deal with all the rules, I'm probably not unless there's enough of a potential market. [00:30:58] Speaker D: Right. And you know, you'll have companies that over expand into these markets and launch more of them than they need and kind of pull back over time. So it's going to be interesting, you know, over time of like, okay, we're going to launch in country A, B and C. Okay, maybe A and B can be merged into one, you know, and kind of we'll see. I think over time. Like we said with the geopolitical, like will the UK be happy in Germany where Germany won't be happy in the uk so like it's going to be very specific to customers and what everyone or every customer's, you know, risk tolerance is. And you know, I don't know about your guys's day job, but like Germany is very strict on a lot of stuff. So like they seem to be one country where other countries are more okay with there. But I feel like one flip of a switch and everything's going to change and you're going to have a mass exodus, you know, at least from my customer base thus far. [00:31:53] Speaker C: Yeah. And migrating giant data sets, you know, like say the regional politics change and you need to change your primary region to just some other European country site and you have so much data like that's, it's, it's slow, it's expensive, like it sucks. [00:32:10] Speaker D: And it'll be interesting to see, you know, how you architect these things, you know, on Azure, a lot of the premium services that you need in order to get those SLAs or get those security features have a high cost. So starting clouds in those areas can be a higher cost. You know, start if you have to deal with the fixed costs of an environment. So it's going to be, you know, there's going to be a lot of business decisions around, okay, are we willing to eat some profit margins to get these customers or do we just charge more? If you're in this country where we have less customers and less scale, you know, Frank shared infrastructure or anything along those lines. So it's gonna really change the way the world operates. I think at much deeper level than people really think about. [00:32:56] Speaker C: I think I wouldn't care as much except for the fact that implementation and, versus what the actual restrictions are, are not the same. And so like the, you know, GDPR was the sort of first example and you know, you could have, you know, used architecture and you know, intelligence within the application to manage that. But no, what do people do? They, they built a whole other farm in a different location and they restricted all access here and they made, they doubled their operational cost and they made it really cumbersome to operate. And then, you know, all of that opportunity is just waste. So it's, I feel like this is going to magnify that same sort of concern. [00:33:40] Speaker D: Now if you want to spin this on a positive note, it will force companies to make sure that everything's actually in code, it actually has a proper pipeline to deploy it. Because if you're deploying it to one place, someone click UPS the Route 53 entry or something like that. Once you have to do it to 20 places, you really need to have that scale. You really need to have that infrastructure as code. You need to have that pipeline to deploy deployments. It will force companies to do what we all think. I'm going to put words in both of your mouths of do the right thing and have everything in code and have everything automated to a level that is acceptable versus being like, okay, that one thing that took five minutes to click in five in 20 in one place wasn't a big deal. And the rate of changing it, even, let's say in two or three, was low. If they do it wrong now when you have to in 20 places, the rate, the risk of something being a hot, of Messing up that one change becomes much higher on that 20 minute task per location now costs you from 60 minutes up to, I don't know, 600 minutes. Clearly I made up numbers. You can't do math on the fly at 10 o' clock at night. But you know, you get the point of it where you know, you're starting to get that level of effort. And now all I can Think of is the XKCD grid about it of like, you know, how long it takes to do versus radio scripting and yeah. [00:34:59] Speaker C: I mean it depends on like. I know one of the problems I'm having with these regional things is that it's one thing to automate but then different sovereignty rules make the automation different. And so you're supporting Cookie, you're you know, snowflake deployment infrastructures and or logic and I'm just. [00:35:18] Speaker A: Which hurts your soul at a fundamental level. [00:35:20] Speaker C: And I got to stop complaining because I'm going to bring everyone down. [00:35:24] Speaker D: Well I keep going on that one where like not every service is in every cloud. So do you say hey, in this region we use service A, but in this region they haven't deployed the updated SKU of this one so I have to use this one instead. And then this can only talk to these underlying pieces and you have essentially two completely different infrastructures running in different regions because the services aren't available there. Or God forbid you try to standardize your your, you know, SKUs instance types, whatever they're called on Google and definitely not everything is available in every region and you have other sorts of problems. Feel like we took this conversation sideways. [00:36:06] Speaker A: Yeah, yeah. Let's get back on track and talk about Google here, shall we? [00:36:10] Speaker C: Yeah, yeah. [00:36:12] Speaker A: So Google's also getting into some Red Hat love this week by bringing committed use discounts to your Red Hat enterprise Linux licensing, offering up to 20% savings for customers running predictable Red Hat workloads on Compute Engine. This is a big deal for enterprise who've been paying full on demand prices for their Red Hat subscriptions in the cloud. The way these Cubs work is pretty straightforward. You commit to a one year term for a specific number of Red Hat subscriptions in a particular region and project, and in exchange you get that 20% discount off a standard on demand price, which really adds up when you're running enterprise workloads all the time. What's interesting here is Google's position compared to AWS and Azure. While both competitors offer various discount mechanisms for compute resources, Google is specifically targeting the Red Hat subscription costs themselves, which is a significant expense for many enterprises running traditional workloads in the cloud. These discounts kick in when you're utilizing Red hat listings about 80% or more of the time, which is most production enterprise workloads, to be honest. One thing to watch out for though is that these commitments are completely inflexible and once you purchase them, you can't edit or cancel and you're on the hook for the monthly fees regardless of actual usage. So you really do need to nail your capacity planning before pulling the trigger on these. The other big interesting thing is this, that it's the first one I've seen. It's actually targeted the license. It doesn't talk about the compute instance itself. So if I'm committing to the license, but I can move it between any type of instance class, I actually am okay with that. And if that's something we're going to see for other operating systems in the future where maybe Windows has a discount if I'm willing to commit and things like that, this could be interesting move by Google in general. [00:37:42] Speaker C: Yeah, no, it's definitely fascinating and you know, it'd be the, the implementation details will be, you know, make or break, right. Because you know, there's certain things that are very tied to your Red Hat subscription like satellite patching and some of the management things. And so it'll be interesting now that the, the sort of relationship is Google owned. How does that work? And it'll be fascinating. I'm, I'm excited. All right. [00:38:07] Speaker A: Google is launching their Vertex AI ranking API which is essentially a precision filter that sits on top of your existing search or rag systems to dramatically improve result relevance. Google's claiming it can help businesses avoid that scary 82% customer loss rate when customers can't find what they need quickly and addresses the fact that up to 70% of retrieved passages and traditional search often don't contain the actual answer that they're looking for. Google is using this as a drop in enhancement rather than rip and replace solution. And you can keep your existing search infrastructure and just add this API as a RE ranking layer, which means companies can get state of the art semantic search capabilities in minutes instead of going through months of migration. They're offering two models, a default one for accuracy and a fast one for latency. Critical apps performance benchmarks are pretty good. Google's claiming their semantic ranker default 004 model leads industry and accuracy on the BEIR dataset compared to other standalone RE ranking services. And they're bagging up by publishing their value scripts on GitHub for reproducibility. Plus they're saying it's at least 2x faster than competitive re ranking APIs at any scale. Basically this is their answer to Cohere and elasticsearch. [00:39:10] Speaker C: Well, and they're cheating, right? Like this is, I'm sure that this is technology that leverage towards the Google search indexes on the back end as well, right? Managing, you know, the priority to get relevant responses. It's kind of, it's kind of neat. [00:39:26] Speaker D: I wonder if you could take this and start to really get a better, deeper understanding of how Google actually has their algorithm under the hood. I doubt it. This is probably so abstracted from there, but I agree with you. It probably came out of their core algorithm. [00:39:45] Speaker A: Google's Project Shield has proved its worth by defending Krebson Security, which is a leading information security news website against a staggering 6.3 terabytes per second DDoS attack. That's roughly 63,000 times faster than average US broadband and one of the largest attacks ever recorded showing that even free services can provide enterprise grade protection when backed by Google's infrastructure. Project Shield is completely free for eligible organizations, typically news publishers, government election sites and human rights defenders. Google weaponizing their massive global infrastructure for good letting at risk organizations piggyback on the same defenses that protect Google's own services. The technical stack behind Product Shield is impressive. It combines cloud load balancing, cloud CDN and cloud armor to create a multi layered defense that blocks this attack instantly without any manual intervention filtering 585 million packets per second at the network edge before they could even reach the application layer. You know, this is a great way for them to differentiate versus things like AWS Shield and Azure DDoS protection Google's approach is offering. This is the free service of vulnerable organization shows they're thinking about cloud infrastructure as a force protecting free speech and democracy online. So yeah, if you're a Google customer and you are not in those groups, you can get all the same capabilities from their paid services. But if you are in the protected classes of organizations, you can get access to this great technology to protect your website. Today you can sign up for that at the Project Shield website. Again on the DDoS threat part of it which I'm most interested in. The attacks have grown from a 620 gigabyte Mirai bootneck in 2016 to a 6.3 terabytes per second monster in 2024, which is a 10x increase. So if you don't have DDoS production, it's only a matter of when you're going to get DDoS at this point. [00:41:24] Speaker C: Yeah, no kidding. It's interesting because yeah, for a second I thought that Google was releasing a new product, but I'm glad to see this is a project like a, an impact to the community and they're focusing on doing no evil websites and corporations that you know could very frequently be targeted in order to suppress whatever speech is on those sites. So this is kind of. It's nice and to give back to the community. [00:41:54] Speaker D: Yeah, I mean it goes back to Google's original of do no evil and at least they're trying to help people of but of any sorts, you know, and trying to get that freedom of speech out there. It's amazing how they actually do it, you know. And looking at how they do it with the technical stack, you know, hopefully gives people an idea of how they can implement it for their organization. You know, if they might not need all the layers and depending on your size and everything of your organization, but shows you kind of at least the schematic of how to do it. If you do expect a large DDoS attack. [00:42:31] Speaker C: Reading through details, I don't think you're going to get any of that because it's like once you apply they accept it. [00:42:36] Speaker D: You just change the services. [00:42:37] Speaker C: You just change the DNS to point at them. [00:42:41] Speaker A: Yeah, well Google has brought GPU compatibility to serverless with Cloud Run GPUs going generally available killer feature here is that you only pay for what you use down to the second. So you can spin up an Nvidia L4 GPU for AI inference, have it automatically scale to zero when idle, and only pay for the actual seconds of computing time, which is a game changer compared to keeping GPU instances running 24. 7 on traditional cloud infrastructure. According to Google, cold start performance is pretty good. They're showing a sub 5 second startup time to get a GPU instance with drivers installed and ready to go. In a demo they achieved time to first token of about 19 seconds for a Gemma 3.4B model, including everything from cold start to model loading to inference, which makes it viable for real time AI applications that need to scale dynamically. What's clever is that they've removed the traditional barriers to GPU access. There's no quota request required for L4 GPUs anymore. You literally just add GPU GPU1 to your command line or checkerbox in the console, making this as accessible as regular cloud run deployments. Watch democratizes GPU computing for developers who previously couldn't justify the complexity or the cost. Multi regional deployments are capable or capabilities are strong with this. GPU is available in five regions including us, Europe, Asia and you can deploy across multiple regions with a single command for global low latency inference. They show deploying Olana across three continents in one go and this would be a nightmare to set up a traditional GPU infrastructure. Early customers included wayfair or reported 85% cost reductions by combining L4 GPU performance with cloud runs. Auto scaling while companies like Midjourney are using it to process millions of images. And the combination of a reasonable GPU pricing with true scale to zero capabilities seems to be hitting a sweet spot for AI workloads that don't need constant GPU availability. [00:44:22] Speaker C: I mean anything that scales down to zero is okay in my book. So that's pretty rad. [00:44:26] Speaker D: I always like when they give you numbers, 85% cost reduction, you're like okay, what was that from $100,000 down to $15,000? Or was it like what are the real metrics behind it? I would like to know that talking. [00:44:40] Speaker C: GPUs, it's a higher number than that. [00:44:43] Speaker D: True. Well, I thought Cloud Run is more like their container orchestration engine. [00:44:53] Speaker A: It's like Knative is one of those. Yeah, it's basically a Knative wrapper. I mean it's meant for serverless, so I would definitely, you know, you can either run the Knative cloud Run infrastructure yourself or you can have them manage it for you. And they're basically giving you boxes that have GPUs and you're enabling that capability and then they're handling the load balancing and placement of that container inside of their fleet. All right. Google is releasing GKE Volume Populator to help you with all your AIML data transfers. If you're storing your training data or model weights in cloud storage but need to move them to faster storage like hyper disk machine learning for better performance. You had to previously build custom scripts and workflows to orchestrate all those data transfers. But now GKE they can handle automatically through the standard Kubernetes persistent volume claim API. Basically this one GA with Kubernetes 1.33 and they're adding their own special SaaS with native cloud storage integration and fine grained namespace level access controls. Meaning you can have different teams or projects with their own isolated access specific cloud storage buckets without having to manage complex IAM policies across your entire cluster. For AI and overloads, this is a good timing because one of the biggest challenges teams face is efficiently loading massive model weights. Abridge AI reportedly saw up to 76% faster model loading speeds and reduced POD initialization times by using Hyperdisk ML with this feature. So huge when you're dealing with large language models that can be hundreds and hundreds of gigabytes. [00:46:16] Speaker C: This is kind of interesting. I never really thought about like from Kubernetes like accessing storage via API so that you can sort of Have, I guess. I mean, they say no IAM policies. I was, I was wondering where, you know, how they still got to have some sort of authorization somewhere. [00:46:33] Speaker A: But yeah, it's just happening at the Kubelet setup phase, I assume. [00:46:39] Speaker C: Awesome. [00:46:40] Speaker A: I mean, I hadn't really thought about the idea that, you know, if you're trying to spin up an LLM cluster to learn, you know, you're paying for expensive GPU and TPU while you're waiting for data to transfer. That's a pretty big cost savings potentially in some workloads. [00:46:55] Speaker C: Oh yeah, I hadn't even thought of that. Right. Like you get the storage pre populated and pre. Pre in place. That's not how you say that, but yeah, that's cool. [00:47:06] Speaker A: Pre provision perhaps? [00:47:09] Speaker D: Yeah. If your container, if you load it, if you put that in your container, built a container to run it on per thing. You also would have like two larger containers. So yeah, they need a way to cut that, you know, for lack of a better term, pre. Preload time up. So you can. You're leveraging the compute for as much of the time frame as you can. [00:47:33] Speaker A: To. [00:47:34] Speaker D: Problem I've not run into in my life yet and I'm kind of happy about that. [00:47:38] Speaker A: It'd be a fun story to have, I guess. [00:47:40] Speaker D: Yeah. Cold boot for ML Funk for ML Logic. Yeah. [00:47:47] Speaker A: Yeah. All right, let's move on to Azure, who's on the quest to turn every software developer into an AI developer. With Azure AI Foundry, they're a unified platform that brings together models, tools and services for building AI apps and agents at scale. And they're positioned this as the shift from AI assistants that wait for instructions to autonomous agents that can actually be workplace teammates. They're introducing the Azure AI Foundry Agent service as now generally available and lets developers orchestrate multi agent workflows where AI agents can work together to solve complex problems. And this is Microsoft's answer to the growing demand for agentic AI that can automate decision making and complex business processes, which both AWS and GCP haven't quite matched yet. In terms of a unified platform approach. Microsoft is also expanding their model catalog with some heavy hitters, including Grok 3 from Xai Sora from OpenAI coming soon in preview, and over 10,000 open source models from Hugging Face, all with fine tuning support, which gives developers way more choice than what you typically see in competing cloud platforms. The AgentIC DevOps capability they're talking about, including GitHub Copilot, is evolving from just helping you write code to actually doing code reviews. Writing tests, fixing bugs, and even handling app modernization tasks. That used to take months, it can now be done in hours, which could fundamentally change how software teams operate. They've also introduced a site reliability engineering agent that monitors production systems 24. 7 and can autonomously troubleshoot issues as they arise across kubernetes, app services, serverless and databases, giving every developer access the same expertise that powers Azure at a global scale, which is a pretty good value compared to paying for dedicated SRE staff for startups and ISVs. Microsoft is sweetening the deal with flexible Azure credits through Microsoft for startups. And they're reporting that AI and machine learning offer revenue in their marketplaces grew 100% last year. Companies like Neo4J have seen a 6x revenue growth in 18 months through the marketplace, which shows there's real money to be made in the marketplace. [00:49:37] Speaker D: I mean, a lot of these are just, I feel like re announcements from the keynote and from build. Most of these things in here, they kind of just like summed up in a different way. Again, it goes back to MCPS, AgentIC, AI, whatever is the new clickups for all these things. Great, let's have it automatically fix the problem with production by scaling the SQL database. And then, hey, it happens again and it scaled the SQL database and you come in and you now have a 128vcore SQL database. You're going to really love that SRE bot that went in and it fixed the problem overnight and created the bug in GitHub that told you that there was a problem and updated your terraform code that you know to put the right new value in. And you're not even realized it till you come up in the morning. So I feel like there's going to need to be constraints on these things. But I mean, the general concept of, you know, the SRE bot and all these things. Yes. Can drive down costs. Yes. We know AI can make it people be more efficient. It's just a matter of how people leverage them. [00:50:43] Speaker C: I mean, I don't know how, how that's different from, you know, developer teams, you know, running their own production services like that, scaling up without, you know, any regard to budget or performance. Yeah, no, it's the same thing. They just automated all the same thing. Going to be great. [00:51:01] Speaker D: But I need the iO3, you know, for my database that's running. Why would I need that guaranteed performance on my SQL Server that I'm running? What could possibly go wrong with that? [00:51:12] Speaker C: Yeah, so I mean, the way I hope AI rolls out is that it does stuff like this, but then it still requires supervision. So the, the SRE engineers and the DevOps engineers that you already have are now freed up to do more impactful things, right? And so maybe it's refining prompts for these agents, giving them those constraints by, you know, thinking about how they basically operate and all those like things that aren't written down, those intangibles and really getting that executed into prompts. Because that's how I hope this goes down. I mean, I know that business is ruthless and the bottom dollar is always important, so you'll see some places where they just sort of like replace humans with AI, but I think that would be a waste. [00:51:59] Speaker A: Well, if you are a C developer, your life may be dramatically easier. With the. NET 10 Preview 4 they've now introduced the ability to run a single C file directly using. NET RunApp CS that rolls. I'll remember that for sure. Eliminating the need for project files or complex folder structures, essentially bringing Python like simplicity to C development while maintaining a full power of the. NET ecosystem. The new file based approach introduces clever directives, let you reference a nuget packages to the SDKs and set MSBUILD properties right within your C file using simple syntax like package Humanizer at 2.1.14.1, making it perfect for quick scripts, learning scenarios or testing code stimpas without the overhead of creating a full project structure. Particularly brilliant about this invitation is that it's not separate dialect or limited version of C and you're writing the exact same code with the same compiler. When your SH grows beyond a simple file, you can seamlessly convert to a full product using. NET Project Convert app cs, which automatically scaffolds the proper project structure and translate all of your directives. The feature even supports Unix style shebang lines, allowing you to quickly create executable C scripts that run directly from Command line on Linux and macOS, positioning C as a viable turner to Python or Bash for automation scripts and CLI utilities, you could write your automation scripts and strongly type C sharp. Ryan, like I'm sure you'll love to. [00:53:17] Speaker C: Do what. [00:53:22] Speaker D: Come on, scripts in C sharp. What could possibly go wrong? Also, I just saw Ruby Gem hell in my head when I saw humanizer at 2.14.1. [00:53:32] Speaker A: It does requirements not text like in your Python scripts. I don't know if I'm that scared of that. [00:53:36] Speaker D: Yeah, I just. I don't know. It brought me to Ruby Gems for some reason. [00:53:44] Speaker A: Rake, you know, all's lost. [00:53:46] Speaker D: So it's really. [00:53:48] Speaker A: I. I don't have a problem with Ruby Gems. It's really when you get into Rake where things go wrong. [00:53:53] Speaker D: I think Rake was one of those things. I didn't know what it actually did. I just knew I had to run it and then I yelled at it a lot. [00:54:00] Speaker A: Definitely I will not be writing any scripts in C, but I'm sure someone is very happy about this feature. [00:54:05] Speaker C: I'm very mixed on this because it's sort of like. NET development and by extension C sharp. The development patterns I see are already so detached from the running environment and so I feel like this is a further abstraction on top of all the leverage libraries and frameworks that are part of. Net. And so I get if you're a C developer how this feels great. But on the other hand I feel like it's just going to mean that making changes to logging output or the fields to to enrich data is, you know, which is seemingly impossible to work with developer teams today is they're just not even going to be even in the ballpark anymore because the A unless you can make AI do it. But then if you can make AI do it, you don't need this. I don't know, maybe this would be good. I don't know. I feel like I need to go let like have a lie down or get a hug or something. Today. [00:55:05] Speaker D: We'Re especially dark today I'm going to find a. [00:55:08] Speaker A: NET developer and have them tell you how happy is to have this capability so you feel better about it. [00:55:13] Speaker C: Okay, good. [00:55:16] Speaker D: We're going to have Q develop C, net 10, pass it to Gemini, and then run that automatically in app services on Azure. What could possibly go wrong? [00:55:28] Speaker A: I could go wrong. Could be fine. [00:55:30] Speaker D: Yeah. [00:55:32] Speaker A: All right. Microsoft is announcing the general availability of ephemeral OS disks for the latest V6 VM series. This is apparently a big deal for anyone running stateless workloads because you're Getting up to 10x better OS disk performance by using local NVME storage instead of remote Azure storage, eliminating network latency for your operating disk system Disk operations. The beauty of ML Disk is that they're perfect for scale out scenarios like containerized microservices, batch processing jobs or CI CD build agents where you don't need persistent OS state and you can imagine a VM in seconds and get back a clean slate, which is fantastic for auto scaling scenarios where you're constantly spinning up and tearing down instances. Or this is a really great beauty of this to have your developers accidentally delete all their data that they put onto the server because they didn't realize they were ephemeral, but yeah, that's just me. Just me. [00:56:19] Speaker C: Yeah. Yeah. It's never happened to any of us, right? We've never seen that. [00:56:22] Speaker A: Never. Never. [00:56:23] Speaker D: No, no. [00:56:24] Speaker A: This puts Azure in a competitive position against Amazon Instant store volumes and GCPs local SSDs. Though Microsoft implementation is particularly interesting because it specifically targets the OS disk placement on NVMe storage while still allowing you to use regular managed disks for your data volumes if needed. The V6 VM series supports a feature like the DADSv6 and the DDSv6 family, so that rolls off a ton. Azure's latest generation of AMD EPYC processors for combining cutting edge DPU performance with blazing fast local storage. AMD is ideal for performance sensitive workloads that can tolerate the ephemeral nature of the OS disk. For our cost perspective, thermal OS disks are essentially free since you're not paying for managed disk storage and you're just using the local storage that comes with your vm, which could lead to significant savings for large scale deployments where you might have hundreds or thousands of VMs that don't need persistent OS disks. One thing you might is the disks are truly ephemeral, meaning if you delete it or deallocate your vm it is gone forever. So do keep that in mind. And the deployment is surprisingly straightforward. Just a few extra parameters in your ARM template or CLI commands to make yourself either very happy with the cost savings or very sad because you just deleted production. So be careful. [00:57:30] Speaker C: Yeah. Is this persistent across reboots? Because that's always the one that gets people like that's the one I'm like. [00:57:35] Speaker A: Oh I do, I do think it is the earliest cross reboots if you do the, if you deallocate that's when you screwed or moved to a different hardware class right? [00:57:46] Speaker D: So it's if you do review at the OS it's fine. If you do shut down start, you're normally not okay because it deallocates it just like on you know, stop versus stop versus reboot. But if I'm already on a scale set so I'm already serverless or ephemeral feel like I would be fine with this and I definitely have some workloads and seen some workloads where the root volume or like you have to set stuff up to go to the ephemeral disk instant storage on Azure or on AWS where if it's just on the root volume, what do I care? But normally the other benefit is the OS volume is small and the other one's larger because you're getting local storage to the local disk. So like I feel like it's a little bit of a paradigm shift where you then don't have to start splitting stuff out between OS volumes like your C drive, your D drive or you know, your wherever/data versus the rest of it. So it can definitely lead to you accidentally crashing stuff too and have. Because you know you're just going to fill up your root drive versus you fill up your other drive. And sure, delete has hits a health, you know, might hit your health check and deal and delete it because it's failing your health check but here it's just going to crash. So you, you might have a better chance, less likely to catch the error. [00:59:12] Speaker C: Yeah, it's interesting because you know, like it's funny that they tout serverless on this and stateless workloads, but I'm like, wait a second, like if, if I need, if I'm doing that, what do I need this VM for? Like machine type. But I think it's for like hosting my own kubernetes cluster I guess something like that. [00:59:33] Speaker D: If you don't want to set up your drive properly with a data drive. [00:59:38] Speaker C: I mean I've used, I've used like you know, stateless disks for like building giant like local caches that I can, that can be easily refreshed. Yeah. So like you utilizing that cheap storage across many, many different machines as long as your hydration process. Like there's definitely use cases that I like on here for this type of thing. [01:00:00] Speaker D: I had a customer that ran a MongoDB on ephemeral drives. It was like the I instance types or whatever it was on aws and they ran six of them and they figured out that they needed three at any given time. So they could have a zone and a half go down without a problem and, and they would just have performance segregation and they just had N minus 3. So even versus having the secondary drives with the size that they needed in there and with the way their MongoDB was replicated ended up being a good cost savings for them. That sounds with a decently low risk. [01:00:36] Speaker C: Sounds very resilient across like outages that are external, but outages that come from external, like bad rollouts or stuff like that. Sounds like that's scary. But you know, you know, if they've figured it out and they've got enough safeguards, why not? I'm going to say something, I'm going to stay positive. From now on for the last stories. [01:00:55] Speaker D: Don't worry, the last one's a fun one. [01:00:57] Speaker A: Oh, this one's going to. You're going to love this one then. [01:01:00] Speaker D: All right, Ryan, take. [01:01:02] Speaker A: I accused Matt of putting it in the wrong section earlier because this is an Azure story. I'm going to start it out with Azure is expanding support for AWS Bedrock model endpoints across all generative AI policies in Azure API Management's AI gateway. This release enables you to apply advanced management and optimization features of token limit policies, token metric policies and semantic caching policies to AWS Bedrock models, empowering you to seamlessly manage and optimize your multi cloud AI workloads. The key benefits of this feature are applying token limiting, tracking and logging to bedrock APIs, enabling semantic caching to enhance performance and response times for Bedrock models, and achieve unified observability and governance across multi cloud AI endpoints. So Azure, we thank you for making AWS more cost effective and responsive with your capabilities and features. [01:01:51] Speaker C: I mean, do I just not know how these API platforms work? Like why would you pay one at another? Like I really don't get that. Like you can use the models, right? [01:02:04] Speaker D: But if your workload's in Azure and you want to use a service, you want to use an LLM like Claude that isn't in Azure. They had to give you a good way to do it. And the API gateway already has these same features against hitting Azure OpenAI. So somebody must have said, well you don't have the models we want, but we want these features. Therefore we're using Claude like GitHub does for your for their. For on AWS for their Claude model. Therefore we want to go through here. [01:02:39] Speaker C: Do they not have a model garden. [01:02:40] Speaker D: That supports they don't support Claude? [01:02:44] Speaker A: No. Anthropic's not there with. Well I think, I think the 4.0 model is supposed to end up there, but the 3.7 model and the 3.5 models were not. They were only on AWS previously. So for Azure they had to adopt it. But also this is your API gateway if you want to maybe a customer is coming in and using OpenAI and they also want to talk to Bedrock for some reason. Or there's other multi cloud stories where you need to have multiple AI endpoints. So you have one AI gateway in Azure for your entire solution and you're using different models on Gemini or on Azure or aws. Then you have the flexibility by being a gateway and it's a proxy at the end of the day. [01:03:24] Speaker D: Yeah. And it does some decent stuff around, you know, like limiting token usage and things like that. And with the API management apam, you're able to do things like hey, I have multiple Azure OpenAI resources behind it and like round robin between them too. So you essentially get more tokens, you know, so you're not hitting the token limit for each one of them. So. And that's where like some of the rate limiting and some of the other features kind of come in. So it's actually a decent concept to do. I just found it interesting that they were linking it over to Bedrock. Felt like this was like a pretty good like hey doctor Situation within the region or anything else, you know, within multiple regions. Because there definitely hasn't been multiple Azure OpenAI outages where their doctor solution is just have it set up to go to another region. But don't worry that we took down all of America's last week. Don't worry about those details, but it gives you the ability to at least hit other clouds. Which I thought was kind of an interesting story, but I also just thought we'd make fun of Azure. [01:04:32] Speaker C: Well, I mean you never hear about Amazon Bedrock, so I'm going to say something positive and kudos for the Bedrock team. They got a huge win. All those, all that Azure traffic. [01:04:43] Speaker A: Speechless, right? [01:04:45] Speaker D: Speechless. [01:04:47] Speaker A: We should move on. [01:04:48] Speaker C: Yeah. [01:04:49] Speaker A: So I'll be here all week. Our final story for this week is DigitalOcean is making a serious play for AI infrastructure with their new Atlanta 1 data center in Atlanta, Georgia. This is their largest facility to date with 9 megawatts of total power capacity across two data halls, especially designed for high density GPU deployments that AI and machine learning workloads demand. This is a significant shift in DigitalOcean strategy from being primarily known as a developer friendly cloud provider for small workloads to now competing in the GPU infrastructure space, deploying over 300 GPUs including top tier Nvidia H200 and AMD Instinct Mi300X clusters in just the first data hall. The timing of this expansion is particularly important as we're seeing massive demand for GPU resources driven by the AI boom and DigitalOcean is positioning themselves as a more accessible alternative to the hyperscalers for startups and growing tech companies that need GPU compute but don't want the complexity or cost structure of AWS, Azure or GCP. By choosing Atlanta as a location and partnering with FlexCentral for the facility, DigitalOcean is strategically serving the Southern US market where there has been simulated tech growth Offering lower latency for regional customers while maintaining their promise of simplicity and cost effectiveness that made them popular with developers in the first place. Integration of GPU infrastructure alongside their existing services like Droplets, Kubernetes and managed database creates an interesting one stop shop proposition for companies building AI applications, allowing them to keep their entire stack within DigitalOcean's ecosystem rather than mixing their providers. A second data hall is planned for 2025 with even more GPU capacity. This represents a multi year commitment to AI infrastructure, suggesting DigitalOcean sees this as a core to their future rather than just riding the current AI hype wave. This expansion brings DigitalOcean to 16 data centers across 10 global regions which while still small compared to other hyperscaler shows, are serious about geographic distribution and reducing latency from their growing customer base. [01:06:37] Speaker C: That's cool. I mean I never, I've always been like, you know, cheering for the underdog that is Digital Ocean, but I've never really thought about why and this, this article sort of solidified my thoughts, I guess, which is like why did I, why do I like Digital Ocean as a, you know, as a concept? And it is that like the, the hyperscalers are trying to get enterprise businesses and so it's like if you just want a simple little environment, you know. [01:07:03] Speaker D: Would you call it a droplet? [01:07:05] Speaker C: I mean I wouldn't personally because sounds. [01:07:08] Speaker A: I mean it was a cute idea. Digital Ocean and a droplet, like I give them props for cutesiness. I mean we know the other cloud providers use dumb names too, so they're excused. [01:07:20] Speaker D: But like you're saying, look, the. If you're not a massive enterprise or even a small enterprise, if you're a small company, you just want a server. Having a DigitalOcean server was always cheaper than running a light sail or any of the other ones and had a lot less complexity to it. So if you were just spinning up a server to go run a cron job or something simple and there the three large hyperscalers, it was too complex to get into and you were more likely to shoot yourself in the foot. So that's kind of where I always thought DigitalOcean was, was that startup, you know, to medium sized business where you didn't need that massive infrastructure. [01:08:01] Speaker C: Yeah. You know, hopefully they stay to that model. [01:08:04] Speaker A: Right. [01:08:04] Speaker C: They stay consistent to it. So they, you know, because as they get bigger the, the impulse will be to get more complicated. [01:08:11] Speaker D: Larger. Right. [01:08:12] Speaker A: Yeah. [01:08:12] Speaker C: And so because it's like more customers means more people asking for their special little thing and, and you know, and Then you get, you know, pricing discounts and all that kind of stuff. And so we'll see, like, I think it'll, we'll, we'll get to see the full life cycle. [01:08:26] Speaker A: I mean, I think they've been smart in what they have invested in. Like they haven't gone, you know, after a lot of databases. I think they're doing MySQL and Postgres or maybe only one of those two. Yes, they're supporting Kubernetes, but again in a highly managed way. You know, again with GPUs, like I don't think elevator gonna have a wide selection of different GPUs. But hey, if you need some GPUs, they got some H2 hundreds, they got some Mi300X's for you. I think there's a way to carve out a niche that is valuable for them that they can continue to use and develop. That doesn't make them into, I mean maybe they do become the next big hyperscaler and then they lose that underdog edge that you called it. But I think what I've seen them do is just smart investments in areas where their customers are asking. They're good growth levers. It helps them with investment because they're getting GPUs and other vendors may be interested in buying GPUs from them. So I think there's advantage to DigitalOcean. But I like them a lot. I use them in the past. I used to use Linode as well. Now Linode is owned by akamai. So really DigitalOcean is the last standalone in this space. I would say the Snowflake's trying to kind of become a cloud on its own, but they're built on top of aws, so you're just paying a markup to them. But maybe they'll come up with their own hardware at some point or maybe you'll see them buy somebody like DigitalOcean in the future to get them the footprint into hyperscaling. But, but I think it's interesting in. [01:09:50] Speaker C: General, I do agree with you. I think the AI purchase is very strategic and I think well placed, you know, because. Well, I say, I think that both realities exist where this is like a wave of new hotness and completely a fad. But I also feel like there's so many root elements of AI that are going to systematically change how we do business where you're going to need some capacity to build this into your applications and so all their customers are going to need a little bit of it. So this is, I think, very smart for them to do. It's cool. [01:10:23] Speaker A: Well, gentlemen, I'm going to enjoy the rest of Finops X this week, and I'm looking forward to reporting back what I find here at the show. But I hope you guys have a fantastic week in your clouds and we'll see you next week. [01:10:34] Speaker D: I'll go listen to some lullabies. I'll talk to you later. [01:10:36] Speaker C: Yeah. Bye, everybody. [01:10:38] Speaker A: Bye. [01:10:39] Speaker D: Bye. [01:10:43] Speaker B: And that's all for this week in Cloud. We'd like to thank our sponsor, Archera. Be sure to click the link in our show notes to learn more about their services. While you're at it, head over to our [email protected] where you can subscribe to our newsletter, join our Slack community, send us your feedback, and ask any questions you might have. Thanks for listening and we'll catch you on the next episode.

Show Notes

Titles we almost went with this week:

General News

Cloud Tools

AWS

GCP

Azure

Other Clouds

Closing

Chapters

Episode Transcript

Other Episodes

Episode 143

143: It’s Chaos in the Cloud Pod Studio

Episode 309

309: Microsoft tries to give away cloud services for free, sadly, it's only SQL

Episode

67 – BigQuery Simulates The CloudPod March Madness