316: Microsoft’s New AI Agent Has Trust Issues (With Software)

Welcome to episode 316 of The Cloud Pod, where the forecast is always cloudy! This week we’ve got earnings (with sound effects, obviously) as well as news from DeepSeek, DocumentDB, DigitalOcean, and a bunch of GPU news. Justin and Matt are here to lead you through all of it, so let’s get started!

Titles we almost went with this week:

Lake Sentinel: The Security Data Monster Nobody Asked For
Certificate Authority Issues: When Your Free Lunch Gets a Security Audit
Slash and Learn: Gemini Gets Command-ing
DigitalOcean Drops Anchor in AI Waters with Gradient Platform
The Three Stages of Azure Grief: Development, Preview, and Launch
E for Enormous: Azure’s New VM Sizes Are Anything But Virtual
SRE You Later: Azure’s AI Agent Takes Over Your On-Call Duties
Site Reliability Engineer? More Like AI Reliability Engineer
Azure Disks Get Elastic Waistbands
Agent Smith Would Be Proud: Google’s Multi-Agent Matrix Gets Real
C4 Yourself: Google Explodes Into GA with Intel’s Latest Silicon
The Cost is Right: GCP Edition
Penny for Your Cloud Thoughts: Google’s Budget-Friendly Update
DocumentDB Goes on a Diet: Now Available in Serverless Size
MongoDB Compatibility Gets the AWS Serverless Treatment
No Server? No Problem: DocumentDB Joins the Serverless Party
Stream Big or Go Home: Lambda’s 10x Payload Boost
Lambda Response Streaming: Because Size Matters
GPT Goes Open Source Shopping
GPT’s Open Source Awakening
When Your Antivirus Needs an Antivirus: Enter Project Ire
The Opus Among Us: Anthropic’s Coding Assistant Gets an Upgrade
Serverless is becoming serverful in streaming responses

General News

02:08 It’s Earnings Time! (INSERT AWESOME SOUND EFFECTS HERE)

02:16 Alphabet beats earnings expectations, raises spending forecast

Google Cloud revenue hit $13.62 billion, up 32% year-over-year, with OpenAI now using Google’s infrastructure for ChatGPT, signaling growing enterprise confidence in Google’s AI infrastructure capabilities.
Alphabet is raising its 2025 capital expenditure forecast from $75 billion to $85 billion, driven by cloud and AI demand, with plans to increase spending further in 2026 as it competes for AI workloads.
AI Overviews now serves 2 billion monthly users across 200+ countries, while the Gemini app reached 450 million monthly active users, demonstrating Google’s scale in deploying AI services globally.
The $10 billion increase in planned capital spending reflects the infrastructure arms race among cloud providers to capture AI workloads, which require significant compute and specialized hardware investments.
Google’s cloud growth rate of 32% outpaces its overall revenue growth of 14%, indicating the strategic importance of cloud services as traditional search and advertising face increased AI competition.

03:55 Justin – “I don’t know what it takes to actually run one of these large models at like ultimate scale that like a ChatGPT needs or Anthropic, but I have to imagine it’s just thousands and thousands of GPUs just working nonstop.”

04:31 Microsoft (MSFT) Q4 earnings report 2025

Microsoft reported Q4 fiscal 2025 earnings with revenue of $76.44 billion, up 18% year-over-year and beating expectations, marking the fastest growth in over three years.
Azure revenue grew 39% in Q4, significantly exceeding analyst expectations of 34-35%, with Microsoft disclosing for the first time that Azure and cloud services exceeded $75 billion in annual revenue for fiscal 2025.
Microsoft’s AI investments are showing returns with 100 million monthly active users across Copilot products, driving higher revenue per user for Microsoft 365 commercial cloud products.
Capital expenditures reached $24.2 billion for the quarter, up 27% year-over-year, as Microsoft continues aggressive data center buildout for AI workloads alongside peers like Alphabet ($85B annual) and Meta ($66-72B annual).
Microsoft’s market cap crossed $4 trillion in after-hours trading, becoming only the second company, after Nvidi,a to reach this milestone, driven by strong cloud and AI momentum.

06:33 Amazon earnings key takeaways: AI, cloud growth, tariffs

Things weren’t quite as great for Amazon…
Amazon’s capital expenditure could reach $118 billion in 2025, up from the previous $100 billion forecast, with spending primarily focused on AI infrastructure alongside competitors Meta ($66-72B) and Alphabet ($85B).
AWS revenue grew 18% year-over-year, trailing Microsoft Azure’s 39% and Google Cloud’s 32% growth rates, though AWS maintains a significantly larger market share with the second player at approximately 65% of AWS’s size.
Amazon’s generative AI initiatives are generating multiple billions in annualized revenue for AWS, with potential monetization through services like Alexa+ at $19.99/month or free for Prime members.
Despite initial concerns about tariffs impacting costs, Amazon reported 11% growth in online store sales and 12% increase in items sold, with no significant price increases or demand reduction observed.
The company expects Q3 revenue growth of up to 13%, suggesting tariffs have been absorbed by suppliers and customers, though uncertainty remains with the U.S.-China trade agreement deadline on August 12.

08:08 Justin – “They’re not there yet. And they, they haven’t been there for a while, which is the concerning part. And I don’t know, you know – I haven’t really heard much about Nova since they launched. They talk a lot about their Anthropic partnership, which makes sense. But I don’t feel like they have the swagger in AI that the others do.”

AI Is Going Great – or How ML Makes Its Money

11:23 Gemini 2.5: Deep Think is now rolling out

Google’s Gemini 2.5 Deep Think uses parallel thinking techniques and extended inference time to solve complex problems, now available to Google AI Ultra subscribers in the Gemini app with a fixed daily prompt limit.
The model achieves state-of-the-art performance on LiveCodeBench V6 and Humanity’s Last Exam benchmarks, with a variation reaching gold-medal standard at the International Mathematical Olympiad, though the consumer version trades some capability for faster response times.
Deep Think excels at iterative development tasks like web development, scientific research, and algorithmic coding problems that require careful consideration of tradeoffs and time complexity.
The technology uses novel reinforcement learning techniques to improve problem-solving over time and automatically integrates with tools like code execution and Google Search for enhanced functionality.
Google plans to release Deep Think via the Gemini API to trusted testers in the coming weeks, signaling potential enterprise and developer applications for complex reasoning tasks in cloud environments.

13:02 Justin – “…these deep thinking models are the most fun to play with, because you know, you don’t need it right away, but you want to go plan out a weekend in Paris, or I want you to, uh, go compare these three companies products based on public data and Reddit posts and things like that. And it goes, it does all this research, then it comes back with suggestions. That’s kind of fun. The more in depth it is, the better it is in my opinion.So the deep thinking stuff is kind of the coolest, like heavy duty research stuff.”

14:17 Introducing Gpt OSS

OpenAI is releasing the new GPT-OSS-120b and GPT-oss-20b open weight language models that deliver strong real-world performance at low costs.
They’re both available under the flexible Apache 2.0 license; these models on reasoning tasks demonstrate strong tool use capabilities and are optimized for efficient deployment on consumer hardware.
Gpt-oss-120b model achieves near-parity with OpenAI o4-mini on core reasoning benchmarks while running efficiently on a single 80 GB GPU.
The gpt-oss-20b model delivers similar results to OpenAI o3-mini on common benchmarks and can run on edge devices with just 16 GB of memory, making it ideal for on-device use cases, local inferenc,e or rapid iteration without costly infrastructure.
They’re also both compatible with the responses API and are designed to be used within agentic workflows with exceptional instruction following, tool use like web search or Python code execution, and reasoning capabilities.

15:30 Matt – “I’m still stuck on the 16 gigabytes of memory on your video card. I still remember, I bought my video first video card, it had 256 megabytes. It was a high end video card. And now I’m like, God, these things got so much bigger and faster. Okay, I’m officially old.”

16:43 Project Ire autonomously identifies malware at scale – Microsoft Research

Microsoft Research developed Project Ire, an autonomous AI agent that reverse engineers software files to determine if they’re malicious, achieving 0.98 precision and 0.83 recall on Windows driver datasets. The system uses LLMs combined with decompilers, binary analysis tools, and memory sandboxes to analyze code without human assistance.
The technology addresses a significant cloud security challenge where Microsoft Defender scans over 1 billion devices monthly, requiring manual review of suspicious files by experts who face burnout and alert fatigue.
Project Ire automates this gold-standard malware classification process at scale.
The system creates an auditable “chain of evidence” for each analysis, using tools like angr and Ghidra to reconstruct control flow graphs and identify malicious behaviors like process termination, code injection, and command-and-control communication. It was the first reverse engineer at Microsoft (human or machine) to author a conviction case for blocking an APT malware sample.
In real-world testing on 4,000 hard-target files that couldn’t be classified by other automated systems, Project Ire achieved 0.89 precision with only 4% false positives, demonstrating potential for deployment alongside human analysts.
The prototype will be integrated into Microsoft Defender as Binary Analyzer for threat detection.
This development represents a practical application of agentic AI in cybersecurity, building on the same foundation as GraphRAG and Microsoft Discovery, with future goals to detect novel malware directly in memory at cloud scale.

19:15 Justin – “I can think of all the things that can make us more efficient at and more productive with, and it’s like wow, that’s a great use case… it just takes away all of the noise.”

27:22 Claude Opus 4.1 \ Anthropic

Claude Opus 4.1 achieves 74.5% on SWE-bench Verified coding benchmark, with GitHub reporting notable improvements in multi-file code refactoring and Rakuten praising its precision in debugging large codebases without introducing bugs
The model is available across major cloud platforms, including Amazon Bedrock and Google Cloud’s Vertex AI, at the same pricing as Opus 4, making it accessible for enterprise cloud deployments
Opus 4.1 uses a hybrid reasoning approach with extended thinking capabilities up to 64K tokens for complex benchmarks, while maintaining simpler scaffolding for coding tasks using just bash and file editing tools
Windsurf reports the upgrade delivers a one standard deviation improvement over Opus 4 on their junior developer benchmark, comparable to the performance leap between Sonnet 3.7 and Sonnet 4
For cloud developers, the immediate upgrade path is straightforward – simply switch to claude-opus-4-1-20250805 via the API with no pricing changes or major integration modifications required

AWS

29:09 Announcing general availability of Amazon EC2 G6f instances with fractional GPUs – AWS

AWS launches G6f instances with fractional GPU capabilities, offering 1/8, 1/4, and 1/2 GPU partitions powered by NVIDIA L4 Tensor Core GPUs, enabling customers to right-size workloads and reduce costs compared to full GPU instances.
The instances target graphics workloads, including remote workstations for media production, CAD engineering, ML research, and game streaming, with configurations ranging from 3-12 GB GPU memory paired with AMD EPYC processors.
This represents AWS’s first GPU partitioning offering, addressing the common challenge of GPU underutilization where workloads don’t require full GPU resources but previously had no smaller options.
Available across 11 regions with On-Demand, Spot, and Savings Plan pricing options, requiring NVIDIA GRID driver 18.4+ and supporting Amazon DCV for remote desktop access.
The fractional approach could significantly reduce costs for organizations running multiple smaller GPU workloads that previously required dedicated full GPU instances, particularly beneficial for development, testing, and lighter production workloads.

30:15 Matt – “The fractional GPUs is an interesting concept; most people probably don’t need a massive GPU… so of you’re just doing one off things or you need it for a specific project, then you can get that small usage. “

31:07 Amazon DocumentDB Serverless is now available | AWS News Blog

Amazon DocumentDB Serverless automatically scales compute and memory using DocumentDB Capacity Units (DCUs), where each DCU provides approximately 2 GiB of memory plus corresponding CPU and networking resources, with a capacity range of 0.5-256 DCUs.
The service offers up to 90% cost savings compared to provisioning for peak capacity and charges a flat rate per second of DCU usage, making it cost-effective for variable workloads, multi-tenant environments, and mixed read/write scenarios.
Existing DocumentDB clusters can add serverless instances without data migration by simply changing the instance type, requiring DocumentDB version 5.0 or higher, with the ability to mix provisioned and serverless instances in the same cluster.
Key use cases include handling traffic spikes for promotional events, managing individual database capacity across multi-tenant SaaS applications, and building agentic AI applications that leverage DocumentDB’s built-in vector search capabilities.
The service maintains all standard DocumentDB features, including MongoDB-compatible APIs, read replicas, Performance Insights, and AWS service integrations, while automatically tracking CPU, memory, and network utilization to scale without disrupting availability.

33:04 Justin – “I mean, the one thing about the DCU model – and I see it a bunch of places, because I’ve been doing a lot more serverless with Valkey, and this DCU model comes up a lot. I actually just moved the CloudPod database to serverless Aurora for MySQL. And so I’ve been getting a little more exposed to the whole, whatever that one’s called; something like DCU as well. And it’s a little bit opaque. I definitely don’t love it as a model, but it is so much cheaper.”

35:18 Introducing Amazon Application Recovery Controller Region switch: A multi-Region application recovery service | AWS News Blog

Amazon Application Recovery Controller (ARC) Region switch provides automated orchestration for multi-Region application failover, addressing enterprise concerns about untested recovery procedures and unknown dependencies during Regional outages.
The service supports nine execution block types, including EC2 Auto Scaling, Aurora Global Database failover, Route 53 health checks, and EKS/ECS resource scaling, enabling coordinated recovery across compute, database, and DNS services.
Region switch uses a Regional data plane architecture where recovery plans execute from the target Region, eliminating dependencies on the impacted Region and providing more resilient recovery operations.
Continuous validation runs every 30 minutes to check resource configurations and IAM permissions.
The service costs $70 per month per plan supporting up to 100 execution blocks or 25 child plans.
Organizations can balance cost and reliability by configuring standby resource percentages, though actual capacity depends on Regional availability at recovery time, making regular testing essential for confidence in disaster recovery strategies.

36:23 Matt – “I like the note here: ‘to facilitate the best possible outcomes, we recommend you regularly test your recovery plans and maintain appropriate service quotas in your standby region’ because the amount of times I’ve seen people try to do DR testing and then they his a service quota limit is comical at this point.”

38:42 AWS Lambda response streaming now supports 200 MB response payloads – AWS

AWS Lambda response streaming now supports 200 MB response payloads, a 10x increase from the previous 20 MB limit, enabling direct processing of larger datasets without compression or S3 intermediary steps.
This enhancement targets latency-sensitive applications like real-time AI chat interfaces and mobile apps where time to first byte directly impacts user experience and engagement metrics.
The expanded payload capacity opens new use cases, including streaming image-heavy PDFs, music files, and real-time processing of larger datasets directly through Lambda functions.
Response streaming is available on Node.js managed runtimes and custom runtimes across all AWS regions where the feature is supported, with the 200 MB limit now set as default.
This update reduces architectural complexity by eliminating workarounds previously required for payloads exceeding 20 MB, potentially lowering costs associated with S3 storage and data transfer fees.

GCP

40:26 Gemini CLI: Custom slash commands | Google Cloud Blog

Gemini CLI now supports custom slash commands through .toml files and Model Context Protocol (MCP) prompts, allowing developers to create reusable prompts for common workflows like code reviews or planning tasks.
This brings GitHub Copilot-style command functionality to Google’s AI assistant in the terminal.
Commands can be scoped at the user level (available across all projects) or the project level (checked into Git repos), with namespacing support through directory structures.
The implementation uses minimal configuration requirements – just a prompt field – making it accessible for quick adoption.
The MCP integration enables Gemini CLI to automatically expose prompts from configured MCP servers as slash commands, supporting both named and positional arguments. This positions Google to leverage the growing ecosystem of MCP-compatible tools and services.
Key use cases include automating code reviews, generating implementation plans, and standardizing team workflows through shared command libraries. The shell command execution feature (!{…}) allows integration with existing CLI tools and scripts.
While this is a developer productivity tool rather than a cloud service, it strengthens Google’s developer ecosystem play against GitHub Copilot and Amazon Q Developer.
The feature is available now with a simple npm update, requiring only a Gemini API key to get started.

37:18 Matt – “I still like the VS Code plugin, and making it interact more that way. I find that a little bit better from the little bit I’ve played with Claude Code, but recently I’ve been talking to people who say Claude Code has gotten better since the initial release so I have to go back and play with it and see.”

42:40 Agent2Agent protocol (A2A) is getting an upgrade | Google Cloud Blog

Google releases A2A protocol version 0.3 with gRPC support, security card signing, and Python SDK improvements, positioning it as an open standard for multi-agent AI systems that can communicate across different platforms and vendors.
The protocol now has native support in Google’s Agent Development Kit (ADK) and offers three deployment paths: managed Agent Engine, serverless Cloud Run, or full control with GKE, giving developers flexibility in how they scale their agent systems.
Over 150 organizations, including Adobe, ServiceNow, and Twili,o are adopting A2A, with real implementations like Tyson Foods and Gordon Food Service using collaborative agents to share supply chain data and reduce friction in their operations.
Google is launching an AI Agent Marketplace where partners can sell A2A-enabled agents directly to customers, while Agentspace provides a governed environment for users to access these agents with enterprise security controls.
The protocol was contributed to the Linux Foundation in June 2024, making it a vendor-neutral standard that could become the HTTP of agent-to-agent communication, though adoption will depend on whether competitors embrace an open approach.

44:18 Justin – “Agent to Agent is basically how you make MCP to MCP work in the cloud.”

44:38 C4 VMs based on Intel 6th Gen Xeon Granite Rapids now GA | Google Cloud Blog

Google launches C4 VMs on Intel Xeon 6 processors (Granite Rapids) with up to 30% better general compute performance and 60% improvement for ML recommendation workloads compared to the previous generation, making them the first major cloud provider to offer Xeon 6.
New C4 shapes include Titanium Local SSD variants delivering 7.2M max read IOPS (3x higher than comparable offerings from other hyperscalers) and 35% lower access latency, targeting high-performance databases, big data processing, and media rendering workloads.
C4 bare metal instances provide direct CPU/memory access for commercial hypervisors and SAP workloads, achieving 132,600 aSAPs – the highest of any comparable machine – with 35% performance improvement over C3 bare metal.
The expanded C4 series maintains existing CUD discounts and integrations with managed instance groups and GKE custom compute classes, available in 19 zones with shapes ranging from 4 to 288 vCPUs.
Key use cases include AI inference with FP16-trained models using Intel AMX-FP16, financial services requiring microsecond-level latency improvements, and visual effects rendering with reported 50% speedups over n2d instances..

46:24 Announcing Cloud Hub Optimization and Cost Explorer for developers | Google Cloud Blog

Google launches Cloud Hub Optimization and Cost Explorer in public preview, providing application-centric cost visibility across multiple projects without additional charges, addressing the challenge of tracking expenses for applications that span dozens of GCP projects.
The tools integrate Cloud Billing cost data with Cloud Monitoring utilization metrics to surface underutilized resources like GKE clusters with idle GPUs, showing average vCPU utilization at the project level to identify optimization candidates.
Unlike traditional cost dashboards that show aggregate Compute Engine costs, Cost Explorer breaks down spending by specific products, including GKE clusters, Persistent Disks, and Cloud Load Balancing for more granular cost attribution.
Built on AppHub Applications framework, the solution reorganizes cloud resources around applications rather than projects, competing with AWS Cost Explorer and Azure Cost Management by focusing on application-level cost optimization.
MLB’s Principal Cloud Architect reports that the tools help monitor costs across tens of business units and hundreds of developers, with particular value for organizations shifting left on cloud cost management.

47:26 Justin – “And if you’ve ever used the Google Cloud Optimization Hub and Cost Explorer previously, you’d know they’re hot garbage. So this was a very appreciated announcement at Google Next.”

Azure

49:10 Introducing Microsoft Sentinel data lake | Microsoft Community Hub

Microsoft Sentinel data lake enters public preview as a fully managed security data lake built directly into Sentinel, allowing organizations to store all security data in one place with cost-effective long-term retention while eliminating the need to build custom data architectures.
The service integrates with 350+ existing Sentinel connectors including Microsoft 365, Defender, Azure, AWS, and GCP sources, storing data in open formats that support both Kusto queries and Python notebooks through a new Visual Studio Code extension for advanced analytics.
Pricing separates data ingestion/storage from analytics consumption, enabling customers to store high-volume, low-fidelity logs like network traffic cost-effectively in the data lake tier while automatically mirroring critical analytics-tier data to the lake at no extra charge.
Key differentiator from AWS Security Lake is the native integration with Microsoft’s security ecosystem and managed compute environment – security teams can run scheduled analytics jobs and retroactive threat intelligence matching without managing infrastructure.
Target use cases include forensics analysis, compliance reporting, tracking slow attacks over extended timeframes, and running ML-based anomaly detection on historical data, with results easily promoted back to the analytics tier for investigation.

51:40 Matt – “Kusto is their proprietary time series database. So all of Azure metrics. And you can even pay for teh service and leverage it yourself as Azure data explorer.”

38:01 Announcing General Availability of Azure E128 & E192 Sizes in the Esv6 and Edsv6-series VM Families | Microsoft Community Hub

Azure launches E128 and E192 VM sizes with up to 192 vCPUs and 1832 GiB RAM, targeting enterprise workloads like SAP HANA, large SQL databases, and in-memory analytics.
These new sizes use Intel’s 5th Gen Xeon Platinum processors and deliver 30% better performance than the previous Ev5-series.
The VMs feature Azure Boost technology providing 400K IOPS and 12 GB/s storage throughput with 200 Gbps network bandwidth, plus NVMe interface delivering 3X improvement in local storage IOPS. This positions them competitively against AWS’s memory-optimized instances like X2iezn and GCP’s M3 series.
Intel Total Memory Encryption (TME) provides hardware-based memory encryption for enhanced security, addressing enterprise concerns about data protection in multi-tenant environments. The isolated VM option (E128i and E192i) offers dedicated physical hosts for compliance-sensitive workloads.
Currently available in 14 regions including major markets like East US, West Europe, and Japan East, with expansion planned for 2025. Pricing follows standard Azure VM models with both diskful (Edsv6) and diskless (Esv6) options to optimize costs based on storage needs.
These sizes specifically target customers running memory-intensive applications who need to scale beyond traditional VM limits without moving to specialized services. The combination of high memory capacity, enhanced networking, and improved storage performance makes them suitable for consolidating multiple workloads.

56:12 Announcing a flexible, predictable billing model for Azure SRE Agent | Microsoft Community Hub

Azure SRE Agent is a pre-built AI tool for root cause analysis and incident response that uses machine learning to analyze logs and metrics, helping site reliability engineers focus on higher-value tasks while reducing operational costs and improving uptime.
The billing model introduces Azure Agent Units (AAU) as a standardized metric across all Azure agents, with a fixed baseline cost of 4 AAU per hour ($0.40/hour) for continuous monitoring plus 0.25 AAU per second for active incident response tasks.
As part of Microsoft’s Agentic DevOps strategy, SRE Agent represents a shift toward AI-native cloud operations where intelligent agents handle routine tasks automatically, competing with AWS DevOps Guru and Google Cloud’s Operations suite.
The dual-flow architecture keeps the agent always learning from normal behavior patterns while ready to activate AI components instantly when anomalies are detected, providing 24/7 intelligent monitoring without manual intervention.
Target customers include organizations managing complex cloud workloads who want predictable operational costs – the usage-based pricing means you only pay for active incident response time beyond the baseline monitoring fee.

57:25 Matt – “I really want to play with this. I’m a little terrified of what the cost is gong to be.”

59:02 Generally Available: Live Resize for Premium SSD v2 and Ultra NVMe Disks

Azure’s Live Resize feature for Premium SSD v2 and Ultra NVMe disks enables storage capacity expansion without downtime, addressing a common pain point where disk resizing traditionally required VM restarts and application disruption.
Hasn’t Amazon had this forever?
This positions Azure competitively against AWS EBS volume modifications and GCP persistent disk resizing, though Azure’s implementation specifically targets their high-performance disk tiers used for latency-sensitive workloads like databases and analytics.
The feature supports cost optimization by allowing customers to start with smaller disk sizes and scale up only when needed, avoiding overprovisioning costs that can add thousands of dollars monthly for enterprise workloads.
Target use cases include production databases, real-time analytics platforms, and high-transaction applications where both performance consistency and zero-downtime operations are critical requirements.
Implementation requires no code changes and works through standard Azure portal, CLI, or API commands, making it accessible for both manual operations and automated infrastructure-as-code deployments.

1:00:03 Justin – “I’m just mad this didn’t exist until today.”

1:01:20 Generally Available: Agentless multi-disk crash consistent backup for Azure VMs

Azure Backup now supports agentless multi-disk crash consistent backups for VMs in general availability, eliminating the need to install backup agents or extensions on virtual machines while maintaining data consistency across multiple disks.
This feature addresses a common pain point for enterprises running multi-disk applications like databases where crash consistency across all disks is critical for successful recovery, competing directly with AWS’s EBS snapshots and GCP’s persistent disk snapshots.
The agentless approach reduces VM overhead and simplifies backup management by leveraging Azure’s infrastructure-level capabilities rather than guest OS agents, making it particularly valuable for locked-down or legacy systems where agent installation is problematic.
Target use cases include SQL Server, Oracle databases, and other multi-disk applications where maintaining write-order consistency across volumes is essential, with pricing following standard Azure Backup rates based on protected instance size.
This positions Azure Backup closer to feature parity with native hypervisor-level backup solutions while maintaining cloud-native scalability and integration with Azure Recovery Services vault for centralized management.

1:01:56 Justin – “I’ll tell you – if you are running this on SQL Server or Oracle; things like asset compliance are very, very important and you need to test the crap out of this, because my experience has been that if you are not quiescing the data to the disk, it doesnt matter if you snapshotted all the partitions together – you are still going to have a bad time.”

Other Clouds

1:04:18 Introducing Gradient: DigitalOcean’s Unified AI Cloud | DigitalOcean

DigitalOcean is consolidating its AI offerings under a new unified platform called Gradient, combining GPU infrastructure, agent development tools, and pre-built AI applications into a single integrated experience for developers.
The platform includes three main components: Infrastructure (GPU compute for training and inference), Platform (tools for building intelligent agents with upcoming Model Context Protocol support), and Applications (pre-built agents for common use cases).
DigitalOcean is expanding GPU options with AMD Instinct MI325X available this week and NVIDIA H200s coming next month, providing more choice and flexibility for different AI workload requirements.
Existing DigitalOcean AI users won’t need to change anything as all current projects and APIs will continue working, with the rebrand focused on improving organization and documentation.
The platform targets digital native enterprises looking to build AI applications from prototype to production without managing complex infrastructure, competing with larger cloud providers in the AI space.

Closing

And that is the week in the cloud! Visit our website, the home of the Cloud Pod, where you can join our newsletter, Slack team, send feedback, or ask questions at theCloudPod.net or tweet at us with the hashtag #theCloudPod

[00:00:00] Speaker A: Foreign. [00:00:08] Speaker B: Forecast is always cloudy. We talk weekly about all things aws, GCP and Azure. [00:00:14] Speaker C: We are your hosts, Justin, Jonathan, Ryan and Matthew. [00:00:18] Speaker A: Episode 316 recorded for August 5, 2025 Microsoft's new AI agent has trust Issues with Software Good evening Matt. How's it going? [00:00:29] Speaker C: I mean, I have trust issues with a lot of things including AI and software, so I totally agree. [00:00:34] Speaker A: And Microsoft, that's my definitely have trust issues with Microsoft. [00:00:37] Speaker C: There's too many ptsd. [00:00:39] Speaker A: Just say yeah, for sure, for sure. Well, it's another fantastic week. Ryan is still out on vacation, but we'll be back next week and we have some rumors that there might be someone coming in from the British countryside at some point soon. So we'll we'll see if that happens or not. [00:00:56] Speaker C: But we really need the smart person back. [00:01:00] Speaker A: Let's be honest, we really need the guy who knows all the AI stuff because the AI market is just continuing to explode and I'm doing my best to try to keep up with it, but he's more into it than all of us combined. It's time for earnings once again. I knew it was coming. [00:01:21] Speaker C: You normally wait like a half a second more and I was moving my hands up. I didn't quite get there fast enough. [00:01:26] Speaker A: Well, it's partially because I forgot if I clicked it, if it hit play, or if I had to click it and then hit play. So that was. There would have been a pause, but it just played. So there you go. But earning season came and has gone once again and it was relatively good earnings for two out of three, which, hey, that's not bad. That's a majority. Alphabet, or Google as we like to call them, beat earnings expectations and raised their spending forecast. Google cloud revenue hit 13.62 billion, up 32% year over year with OpenAI now using Google's infrastructure for ChatGPT, signaling growing enterprise confidence in Google's AI infrastructure capabilities. Alphabet is raising its 2025 capital expenditure forecast from 75 billion to 85 billion, driven by cloud and AI demand, with plans to increase spending further in 2026 as they complete compete for AI workloads. AI overviews now serve 2 billion monthly users across 200 countries, while Gemini app reached 450 million monthly active users. JonesTrain Google's scale in deploying AI services global globally and the $10 billion increase in planning capable spending reflects the infrastructure arms race among cloud providers to capture AI workloads, which requires significant compute and specialized hardware investments. Google Cloud's growth rate of 32% outpaces its overall revenue growth of 14%, indicating the strategic importance of cloud services as traditional search and advertising faces increased AI competition, mostly driven by AI, which is sort of ironic. [00:02:51] Speaker C: Yeah, I mean, it's obvious that everyone's going to be spending more money with all these AI workloads and some, if not many CFOs I'm sure are looking at the bills going, oh my God, are we actually getting paid enough to cover these costs? So I think you'll see that theme across most of the providers, but nothing here really shocks me that much when. [00:03:13] Speaker A: You'Re talking about GPUs. It definitely doesn't shock me at all that these are billions of dollars of investment and I don't know what it takes to actually run of these large models at like ultimate scale that like a chatgpt needs or anthropic, But I have to imagine it's just thousands and thousands of GPUs just working nonstop. I mean, considering all the rate limiting that I get all the time seems to be quite a bit. [00:03:37] Speaker C: Yeah. And even just training the new models that are all coming out, the 4.5, the 4.5s, the 4.1, you know, and all these other ones that just the initial compute is in the billions too, I feel like. So it doesn't surprise me. [00:03:51] Speaker A: Yeah, well, Alphabet actually right now, these earnings almost, almost three weeks ago for some reason they decided to get away from Microsoft and Amazon. So they weren't all compared at the same time. They have a big delay and then Microsoft came up and they reported their Q4 fiscal 2025 earnings day after recording last week with revenue of $76.44 billion, up 18% year over year and beating expectations, marking the fastest growth in over three years. Azure revenue grew 39% in Q4, significantly exceeding analyst expectations of 34 to 35% with Microsoft disclosing for the first time that Azure and cloud services exceeded 75 billion in annual revenue for fiscal 2025. Microsoft's AI investments are showing returns with 100 million monthly active users across Copilot products driving higher revenue per user for Microsoft. 365 commercial cloud cloud products. Capital expenditures reach 24.2 billion for the quarter, up 27% year over year. As Micro continues aggressive days are build out for AI workloads alongside peers like Alphabet and Meta. Microsoft's market cap crossed 4 trillion after hours trading after the announcement, becoming only the second company after Nvidia to reach that milestone, driven by strongcloud and AI Momentum. And yeah, if you're doing well, in the AI space, you are ren making a lot of money on the stock market right now. [00:05:08] Speaker C: Yeah. And Microsoft was, you know, ended up inadvertently being first seen with backing OpenAI back, you know, years ago. And it doesn't surprise me. And they've gone all in on AI and the copilot theme. You know, there was a last keynote or whatever it was, last conference. I was like, okay. My team and I had a running bet of how many times they were going to say copilot area. That was our, that was our tiebreaker was essentially that we had internally was how many times they say. Because they're shoving it into everything. And you know, just like we just said, and I'm sure we'll repeat in three in a minute now is you need those GPUs. They cost lots of money, but they're making good money on them too. [00:05:51] Speaker A: So that meant that Amazon came up last and they announced earnings last Thursday and things weren't quite as great for them. That's basically how I would describe it. Basically. Their key takeaways were AI, cloud growth and tariffs. Of course, they're heavily impacted by the retail in, you know, infrastructure imports from China. So they're a little bit more impacted on that side than either Microsoft or Google is. But their capital expenditures could reach 118 billion in 25, 2025, up from the previous $100 billion forecast. The spending primary focus of course on AI as well. So capital is king right now. Aws revenue grew 18% year over year, trailing Microsoft's 39% and Google's 32% growth rates, though AWS maintains a significantly larger market share with the second player at approximately 65% of AWS's size. Amazon's generative AI initiatives are generating multiple billions in annualized revenue for AWS, with potential monetization through services like Alexa at $19.99 per month or free for prime members. Despite initial concerns about tariff impacting costs, Amazon reported 11% growth in online store sales and 12% increase in items sold. That was with their prime day in the quarter as well. That company expects Q3 revenue up 13%, suggesting tariffs have been absorbed by suppliers and customers. The uncertainty remains of the US China Trade Agreement. But the big news there, slowest growth, but also the biggest number to grow. So it makes sense that it's a little bit slower, but I also think it's a reflection of the fact that their AI stuff is not as good. [00:07:15] Speaker C: Yeah. And they're trying across the board to be, to get, get it to get it up there, you know, and they're just not quite there yet, I feel like. [00:07:26] Speaker A: I mean not there yet and they, they haven't been there for a while, which is the concerning part. And I don't know, you know, Nova, I haven't really heard much about Nova since they launched. You know, they talk a lot about their anthropic partnership, which makes sense but you know, I, I just don't see. I don't feel like they have the swagger in AI that the others do. [00:07:47] Speaker C: Yeah, but I compare Nova also to like was it Microsoft's like 5 fee model? Whichever one it is, like their internal one. I don't feel like everyone I talk to out there really uses GPT or Claude. A few people I know play with, I call it the specialty ones of like Gemini, like the deep research ones, you know, the analyzation, stuff like that. But I don't think that either Microsoft or Amazon, their own models really are anything standing. [00:08:22] Speaker A: Yeah, I mean Microsoft's not trying though. So I mean like that's the. [00:08:25] Speaker C: They have a few. [00:08:27] Speaker A: Yeah, they're definitely a lot of open source. They're trying some things, trying to get it to be, you know, a thing. But it's just, it's not what you'd expect. So it's interesting. I'm trying to see where Nova, Nova Micro, I, I mean they're middle of the pack on LM arena and they don't even, they don't even make the top 10 on most of the major categories on Leaderboard. They are only in the overall overview. [00:08:50] Speaker C: Which is quite crazy to somebody that they were telling me and I don't know how you know, accurate is, but I trust the source of that. They were about like, they were like 52 in the top models, you know. So like I don't think that they are very high up in there in them. [00:09:06] Speaker A: Yeah. So again they're wanting to be competitive with those other models. They're just not. And where's Fi on this list? Let me see. I mean Phi is a little bit below Nova. So. Yeah, I mean you're close. They're maybe not as far off. Well, we'll see. I think this quarter Amazon has to start really driving an AI story that people get excited about. People need to think bedrock serious. They need to start using different ways. I mean, I know I'm using Claude through AWS natively a lot because I get away from the API limits and I can get a bill with my Amazon bill and so that's kind of handy for a lot of things I'm doing where I don't want to worry about my API limits. So there's definitely things you can do on Amazon that are cool. But then other areas, like I was trying to integrate it into root code the other day and like getting the Bedrock connector to work with it was a little screwy and I was just sort of frustrated with it and I gotta come back to her and to her I'm like, just use my profile. And it's like, okay, here's your profile. It's like, nope, doesn't work. So I had to go figure out how to like provision a bedrock API and do a bunch of things and then I'll, I'll get there. But you know, sort of interesting that it's not as plug and play as like Vertex where you just, you know, plug it right in or even some of the other solutions from ChatGPT or Anthropic's API directly. All right, well, in other ways that AI is how ML makes money. Gemini 2.5 Deepthink has now come out using parallel thinking techniques and extended inference time to solve complex problems. Now available to Google's AI Ultra subscribers in the Gemini app with a fixed daily prompt limit model achieves state of the art performance on live code bench V6 and humanity's last exam benchmarks, with a variation reaching gold medal standard at the International Mathematical Olympiad. Though the consumer version trades some capability for faster response times. Deep Think excels at iterative development tasks like web development, scientific research and algorithmic coding, problems that require careful consideration of the trade offs and time complexity. The technology uses novel reinforcement learning techniques to improve problem solving over time and automatically integrates with tools like code execution and Google Search for enhanced functionality. Google plans to release DeepThink via the Gemini API to trusted testers in coming weeks, signaling potential enterprise and developer applications for complex reasoning tasks in cloud environments. [00:11:21] Speaker C: I mean the video they also show within this article is interesting of like how they are with the parallel thinking abilities, how it's able to think like in a hundred different locations at once and try to build out those use cases where you and I can think a couple paths at the same time and you know, parallel path and issue. They talk about how it was able to kind of work through this mathematical equation and kind of build it out in multiple ways until it found which way was the right way. So it's, it's going to be interesting to see how a compare process and how it will compare with other models over time. [00:12:00] Speaker A: Yeah, we'll have to see you know, these deep thinking models are the most fun to play with because you, some of those, you know, you don't need it right away but you want to like hey, I want you to go plan out a weekend in Paris or I want you to go compare these three companies products based on public data and Reddit posts and things like that. And it goes up results, research, it comes back with suggestions like that stuff that's kind of fun and it takes, you know, a long time and like if it's the more in depth it is, the better it is in my opinion. So the deep thinking stuff is kind of the coolest, like heavy duty research stuff. It's not my most common use of AI, but like when I get to use it, I'm like always excited to use it. [00:12:35] Speaker C: I've definitely used it for like, you know, hey, I needed a sprinkler for my yard. I don't want to think about even where to start for this, you know, $60 item. So like I definitely used it in those ways which I definitely enjoy. [00:12:48] Speaker A: Yeah, I'm definitely looking forward to seeing what it does for a trip I have coming up soon. I've got a bunch of things my wife liked to do on the trip and I want to come up with an itinerary and all the different things. And so I'm going to give it a bunch of parameters and let probably try it with deep research and not deep research because I want to see what it kind of comes with and like ask for more like more, more interesting ideas than the typical tourist traps. [00:13:10] Speaker C: Yeah. [00:13:13] Speaker A: OpenAI is releasing the new GPT OSS120B and OSS20 open weight language models that deliver strong real world performance at low costs. Both available under the flexible Apache 2.0 license. And these models on reasoning. These models on reasoning tasks demonstrate strong tool use capabilities and are optimized for efficient deployment on consumer hardware. The 120B model achieves nearly near parity with OpenAI 04 mini on core reasoning benchmarks while running efficiently on a single 80 gigabyte GPU. Yeah, I have one of those around. The 20B model delivers similar results on the to the the OpenAI 03 mini on common benchmarks and can run on Edge devices with 16 gigs of memory, making it ideal for on device use cases, local inference or rapid iteration without costly infrastructure. I definitely will be trying to play with this if it's in hugging face on my laptop because I like to have a little model on my laptop that I can just kind of quickly prompt Especially if I'm on an airplane or something. I don't use it for coding because it's a little slow for that but just fixing text, helping me write stuff, that kind of thing. So those local models are really nice. Files are both compatible with the Responses API and are designed to be used within agentic workflows with exceptional instruction following tool use like web search or python code execution and reasoning capabilities. [00:14:26] Speaker C: I'm still stuck on the 16 gigabytes of memory on your video card. I just still remember I bought my video first video card. It had 256 megabytes. It was like a high end video card and now I'm like God, these things got so much bigger and faster. Okay, I'm officially old. Got it. [00:14:43] Speaker A: Yeah, actually it's on 16 gigabyte memory. I think that was mem was ram. But you're right, it's probably GPU memory which I don't know if my Mac has that so maybe that's not the one that. But my, my desk, my gaming desktop has 16 gigs of memory for sure in the video card. [00:14:55] Speaker C: Yeah, I, I mean I still run an M1 for my personal and I don't think it has anywhere close but I definitely run same thing like you. I keep you know the models local and I play with them a little bit just to kind of test them out, you know and as I'm doing stuff but less of a use case for planes recently. [00:15:13] Speaker A: Yeah, when I upgraded to my M4 MacBook Pro for my M1 I didn't max out memory thinking about it. I had 64 because of my last one. 64 be playing on this one now and I'm like in hindsight now I was like I probably should have gone for more memory just because these models are getting nothing but bigger, heavier and more hungry. So well, Microsoft Research has developed my favorite project code name in a while. Project ire, an autonomous AI agent that reverses engineer software files to determine if they're malicious, achieving a 0.98 precision and 0.83 recall on Windows driver data sets. System uses LLMs combined with decompilers, binary analysis tools and memory sandboxes to analyze code without human assistance. The technology addresses a significant cloud security challenge where Microsoft Defender scans over 1 billion devices monthly, requiring manual reviews of suspicious files by experts who face burnout and alert fatigue. Project IRE automates this gold standard malware classification process at scale system creates an auditable chain of evidence for each analysis using tools like Anger and Ghidra to reconstruct Control flow graphs and identify malicious behaviors like process termination, code injection and command and control communication. It was the first reverse engineer at Microsoft to author a conviction case for blocking an APT malware sample and real world testing on 4000 hard target files that could be classified by other automated systems. Project IR achieved 0.89 precision with only 4% false positives, demonstrating potential for deployment alongside human analysts. And the prototype will be integrated to Micro Defender as binary analyzer for threat detection later this year. This development represents a practical application of agentic AI and cybersecurity. Building on the same foundation as Graph Rag and Microsoft Discovery, with future goals to detect novel malware directly in memory at cloud scale. [00:16:57] Speaker C: I mean this is pretty cool. I mean, yeah, obviously they had to go back to the large fatigue and burnout to like give it the business use case but being able to kind of run through it with that level of precision is pretty remarkable. You know, especially if you've worked with security people, there's always false positives and whatnot. So 4, 4% false positives means that you're not, you know, then sending it to your human, human analysts to do too much. But they're not missing that much either. If you tackle this with a security and layers, you know, type of methodology then it's a great solid mill piece to kind of hopefully make your security team have to do less day to day operations work and let them get really through their project work which will make everyone be better and safer. [00:17:50] Speaker A: Yep. I mean I just. The idea of using AI in all these use cases, I mean I've got like a bunch of ideas that I'm kicking around for the day job and things I want to do next year and using Agentic and like it's, it's cool. Like it's like I can think of like all the things that can make us more efficient at and more productive with and it's like wow, that's just a great use case. And so you know, there's a lot to do still. I think it's, you know, we're just scratching the surface of what it can do and then you know, there's other areas where I'm using it to like help my friend with his front end development of his website. And it's like I don't know anything about front end code and this thing is broken but his developer is no longer available to us and so it's like, you know, I'm having AI go figure it out and like, nope, it's not working still. You can see it in the screenshot and make, make AI, generate the screenshots and compare them and say, here's the working site, here's the new test site. Like, do they match? It's like, oh no, they don't match. Okay, why? And like, we can work through all these details. So like. And like that's just me. I mean, I know enough coding to be a good backend developer. I could do APIs, I can do all the back end stuff. But like, when it comes to front end, you know, JavaScript, I learned HTML and then I learned CSS and I said screw this noise. So, you know, to be able to have those kinds of flexibilities of these type of things is just hugely valuable. And you know, so being able to have that with an agentic type capability for security people who are technically always overworked, you know, is definitely a big benefit. And I can just see, you know, as we get more sophisticated with AI that you know, yes, there's some job threat to it for sure, but there's also like, hey, this takes away a lot of noise of the busy work that a lot of us had to do beforehand that now we don't do. Like, I, you know, adding a server to a terraform file now is like so easy. I just tell it, hey, I need a server looks like this. Just take my existing template that I always use and modify it versus me copy pasting and doing, you know, you know, little twiddly stuff. AI just does it so much faster, more efficient. [00:19:44] Speaker C: Yeah. I was talking with somebody the other day, I was like, all that, you know, mundane work that, you know, would take four, six, eight hours of just like, hey, go add tags to all my terraform. Because if you're not aws, you can't just do it the default, it just does, you know. And I was working on something, I was like, oh, I want to know where the path of this is so if I see it, I know where to go debug it. I was like, hey AI, go add this tag of source and then just give it the module path so I can find it all later. And it was like in three minutes it had done all that. I was like, and it definitely didn't, you know, I would have missed a lot more than I'm sure it missed just by going through and summing up. Because this was on Azure, you can't just add the top level. [00:20:25] Speaker A: Well, it's sort of funny because, you know, a lot of times where I'll start with a project is I'll Say like, hey AI, take a look at this code base. Here's what it does, here's the basic functionality of it, here's how it was designed and look at it with a well architected framework perspective. Look at it as an architect and how would you make this app more scalable, more secure, more reliable and then just let it kind of come up with recommendations and a pretty detailed to do list of things that it would do and the reasons why and what I think the benefits would be of it. And then it typically tells you how long it thinks it would take you to do that. And some of the estimates are this is a six week project. I'm like, cool, let's do it AI, let's build this together. And it's like, okay, then I'm done in two days with like a major refactoring work was kind of like that's not quite the same as what you said. But I mean it's definitely still not as scalable or as fault tolerant as some of the things that you know, true engineering team could do. But for a lot of like the small little projects that I have at my house, like they're not, you know, a. It's my shitty code to begin with and so it's not great code and anything is better than what I could write most of the time by myself. So. [00:21:37] Speaker C: Yeah, we can definitely go on with this conversation for a while because now we're into how we leverage the AI and we've definitely bypassed the malware at scale conversation. [00:21:47] Speaker A: Yeah, well it would be interesting to you know, like you and Ryan have that repo the shitty script repo. Like let AI say, how do you, how would you make these shitty scripts better? [00:21:58] Speaker C: I don't want to. [00:21:59] Speaker A: I bet it would have really interesting recommendations. [00:22:02] Speaker C: I've definitely given it some old code. I was like, read this and make it be, you know, upgraded to modern because like I have a few scripts that were still like python2.7 that I haven't used. But I was like, just for the fun of it, I was like upgrade this to three and go refactor, you know, and make it be better. It's like, I mean the fact that AI, which I found is a decent way to do it with, but you also kind of end up in loop sometime of like, you know, test driven development. [00:22:30] Speaker A: Well, that's where you and I as people know what we're doing right, don't necessarily can recognize the loops and, or give it better recommendations. I will say, you know, using it a lot now and doing A lot of coding work and doing different things, like, it makes a lot of bad assumptions sometimes or like it'll implement something in a really weird way. And I'm like, whoa, what are you thinking? Like, that's just a terrible way to do this. And so, like, as someone, if you're just purely vibe coding, I could see how this. You could get in trouble where you just make spaghetti code forever. But if you have a little bit of concept context to it, I think you can do some magic and do some really cool stuff. But you have to have enough context and enough awareness how software works and how these practices and benchmarks and things are, and that you can tell the AI what you do. And, you know, it's very similar to, like, if you go to, you know, a visual, you know, image creation AI, uh, I'm trying to forget when the name is one of the popular ones at the moment, but like, stability or whatever. But if you were to tell it, like, draw me a picture, the way that you would describe that picture as a person who's not artistic in that way is probably very different than how an artist would actually describe what he wants it to draw. And the results of you and that other person writing prompts to draw the basically the same picture, they'll make a better picture because they understand artistic things. They understand shading, they understand light and perspective and all these things that you're probably not really thinking about. And so, again, this is where I think, even as a coder, I know code, I know how it works, I know the patterns, I know how the systems work. And that's why I'm good at using prompts versus other people who are not. [00:24:02] Speaker C: Yeah, I mean, you have to still know enough to debug the setup, you know, and what it's doing. It's not at the point where you can just be like, go run wild. You know, I kind of equate it to like an intern or like a just out of college person. They might have a general understanding and can do it, but at one point, you still need to help them out along the way and give it that guidance. Otherwise, I'm terrified of what it's getting built because it's not going to work the way you think it is. [00:24:27] Speaker A: Yeah, Again, that's why you had to have good qa. You have to have good things. And we've talked about a lot of this in cloud journeys in the past about building proper qa. Like Bolt, who's our friendly AI driven chatbot on the cloud Pod Slack channel. You can go and chat with him if you want to. He'll respond to you with his HAL type personality. But you know there's things that Matt and us and us we use it for they're only allowed to us. But you know that like our show notes and how it formats our show notes and it does that it has has actually a Google Doc which I have called the QA doc and that doc is literally how it it basically self tests itself. So when it makes changes to the insert logic for the Google sheet, it basically validates that it didn't insert it incorrectly and it validates against the the the sheet and basically if it doesn't have a certain level of confidence that it's improved or done as well as the sheet is, it will reject the. [00:25:18] Speaker C: Change the joy of test driven development. But you also have to make sure it does decent testing because at one point the test it did I don't remember what I was doing but I looked at like what the test was and I was like this isn't actually what you're supposed to be doing here. So like I had to go fix the test that it developed. [00:25:35] Speaker A: Yeah, and that happens too. Again you had to. Good, good testing practices because like even in my Google Docs when it first wrote the first test it was like every bullet should be these text. I'm like no, no no, no, the bullets are going to change, the text is going to change. I need you to focus on formatting only because that's the part that I care about, not about the actual text itself. And so I was like oh, I got it now. And then it changed the way it wrote the test. But anyways. Well we should move to other AI news versus personal project story time Claude Opus 4.1 has achieved a 74.5 on the SWE Benchmark Verified Coding benchmark with GitHub reporting notable improvements in multifile code refactoring and Rakuten praising its precision and debugging large code bases without introducing bugs. The new model launched this morning, so I have not had time to play with it. It is already available on major cloud providers, including Bedrock Google's Cloud Vertex AI. All the same pricing of the Opus 4. So if you were already paying through the nose for Opus 4, you can now continue to pay that same price for Opus 4.1, which makes it accessible for all your enterprise cloud needs. Opus 4.1 uses a hybrid reasoning approach with extended thinking capabilities up to 64,000 tokens for complex benchmarks while maintaining simpler scaffolding for coding tasks using just bash and file editing tools. Windsurf, which is a very popular vive coding tool, reports that the Upgrade delivers a one standard deviation improvement over Opus 4 on their junior developer benchmark, comparable to the performance leap between Sonnet 3.7 and Sonnet 4. For cloud developers, the immediate upgrade path is straightforward. Simply switch to Cloud Opus 4.1202585 via the API with no pricing change or major integration modifications required, which is really the beauty of this. I did enable it in Claude earlier but I didn't want to take the risk of it messing a bunch of stuff up so I turned it back but it was literally slash model picked it, I ran one command test it and then I moved it back to Sonnet 4 or I was using and I'll get back to it next week when I have more time to test it because again it came out this morning and it's been a busy day. [00:27:35] Speaker C: Yeah, I'm looking forward to kind of playing with this like you. I haven't had time today to play with it, so we'll have to compare notes next week. [00:27:44] Speaker A: Yeah, definitely. Let's move on to our friends at aws. You know, maybe didn't have the best earnings, but they still are announcing a ton of features so always they're launching the G6F instance with fractional GPU capabilities offering you 1-8140/1 GPU partitions powered by the Nvidia L4 tensor core GPU, enabling customers to right size workloads and reduce costs compared to full GPU instances. Instances target graphic workloads including remote workstations for media production, CAD engineering, ML research and game streaming. The configurations ranging from 3 to 12 gigabyte GPU memory paired with AMD EPYC processors. This is AWS's first GPU partitioning offering addressing a common challenge of GPU underutilization. Workloads don't require a full GPU resource, but previously had no smaller option available. Available across 11 regions with on demand spot and savings plan pricing options requiring nvidiate grid driver 18.4 plus and supporting Amazon DCV for remote desktop access. The fractional approach could significantly reduce costs for your organization running multiple smaller GPU workloads. Think workstations in particular for all those CAD use cases. [00:28:53] Speaker C: I mean the fractional GPUs is an interesting concept where you probably for most people don't need that massive of a gpu. So it's almost like reminds me of like the T series where it's not burstable but you get these really small sizes. So if you're just doing kind of one off things or you need it for a specific project, then you can get that small usage or even like for your developers, if you're wanting to give them each a server and they don't need a full GPU at all times, you can kind of give it that way. So I think this is pretty interesting of how to kind of build it out and I think it's a cool feature. [00:29:31] Speaker A: Any any immediate use cases that come to mind for you for it or just playing right now? [00:29:37] Speaker C: Playing right now. [00:29:39] Speaker A: Very good. All right, Amazon DocumentDB has got the serverless treatment this week. They'll now automatically scale, compute and memory usage using the DocumentDB capacity unit or DCU, where each DCU provides approximately 2 gigabytes of memory plus corresponding CPU and networking resources. The capacity from a half a DCU to 256 DCUs. The service offers up to 90% cost savings compared to provisioning for peak capacity and charges a flat rate per second of DCU usage, making it cost effective for variable workloads, multi tenant environments and mixed read write scenarios. Existing documentdb clusters can add serverless instances without data migration by simply changing the instance type requiring documentdb version 5 or higher with the ability to mix provisioned and serverless instances in the same cluster. Key use cases include handling traffic spikes for promotional events, managing individual database capacity across multi tenant SaaS applications, and building agentic AI applications that leverage documentDB's built in vector search capabilities. Service maintains all standard documentDB features including MongoDB compatible APIs, read replicas, performance insights and AWS service integrations while automatically tracking cpu, memory and network utilization to scale without disrupting your availability. [00:30:50] Speaker C: I mean it's great that they ga'd this and it's, you know, kind of nice if you're playing with everything. I always just get worried when I see oh it's this DCU model. It's kind of back to the tokens. Everything else they've abstracted out, how these pricing is so much that your FinOps team is going to have to learn kind of another methodology and making sure that as a developer, as a infrastructure person or sre, that you're leveraging and building these things to the correct scale that you need and having that mixture to be able to burst up. [00:31:27] Speaker A: I mean the one thing about the DCU model, and I see it a bunch of places I've been doing a lot more serverless with like Valky and and you know this DCU model comes up a lot. I actually just moved the cloud POD database to Serverless Aurora for MySQL and so I've been getting a little more exposed to the whole whatever that one's called something like DCUS as well and it, it's a little bit opaque. I definitely don't love it as a model but it is so much cheaper that it's so much worth it. [00:31:57] Speaker C: Well especially for small workloads like like ours like it doesn't really need much so being so minimal is perfectly fine. Like I've done it with like dev environments too is great have it in there if you really want said to scale up at 8am and turn off at 6pm and then let let the auto scaling kind of handle it after that if it needs to if a developer's working late or whatever. But that's where I find it. Really cheap and really valuable. I see feel like a lot of times for production workloads at that point I've seen more people move back to actual full metrics on it. [00:32:32] Speaker A: Yeah, I mean I think you, you make the judgment call. What. What makes sense. So like I was using my SQL box running on a Docker container and ever since I moved the back end of the cloud PAL website to to EFS instead of local disk I wasn't really happy with a bunch of the performance aspects of it. And so it just you know moving it to Aurora was the right natural thing and I didn't want to spend a fortune on it because you know like the cost of my container is a lot less money than the cost of rds but also the cost of efs that provision capacity is very high. So you know you start doing the, the, the calculation what makes the most sense. But I'm so pretty happy with it and again like a lot of the, a lot of the cloudbot stuff for our website is you know we don't actually host the RSS feeds for the for the actual shows. It's used by our our podcast hosting partner. Just really think it's there is the show notes themselves that's mostly static content that gets cached by Cloudflare anyways so we don't have a lot of stuff hitting the back end of our database that often so it's been surprisingly good. [00:33:30] Speaker C: Glad to hear. [00:33:34] Speaker B: There are a lot of cloud cost management tools out there, but only Archera provides cloud commitment insurance. It sounds fancy but it's really simple. Archera gives you the cost savings of a one or three year AWS savings plan with a commitment to short as 30 days. If you don't use all the cloud resources you've committed to, they will literally put the money back in your bank account to cover the difference. Other cost management tools may say they offer commitment insurance, but remember to ask will you actually give me my money back? Our chair will click the link in the Show Notes to check them out on the AWS Marketplace. [00:34:13] Speaker A: All right, Introducing the Application Recovery Controller Region Switch, which is a multi region application recovery service from aws. It provides automated orchestration for multi region application failover, addressing enterprise concerns about untested recovery procedures and unknown dependencies during regional outages. The service supports nine execution block types including EC2 auto scaling, Aurora Global database failover, Route 53 health checking and EKS ECS resource scaling, enabling coordinated recovery across COMPUTE database and DNS services. Region Switch uses a regional data plane architecture where recovery plans execute from target region, eliminating dependencies on the impacted region and providing more resilient recovery operations. Continuous validation runs every 30 minutes to check resource configurations and IAM permissions. While the service costs $70 per month per plan supporting up to 100 execution blocks or 25 child plans, organizations can balance costs and reliability by configuring standby resource percentages through the actual capacity depends on regional availability at the recovery time, making radio testing essential for confidence in disaster recovering strategies. [00:35:17] Speaker C: I like the note here. To facilitate the best possible outcomes, we recommend you regularly test your recovery plans and maintain appropriate service quotas in your stamp at region because the amount of times I've seen people try to do doctor testing and then they hit a service quota limit. It's comical at this point but you know as you look through the document in this a little bit more detail. It's kind of like its own like they made a UI to kind of build out your test plan and everything else. So like it feels like a good like workflow engine which is really what it is here and it solves a lot of those use cases if you're not running an overly complex app. But you do need something like this set up, it doesn't. It feels like a good way to kind of start the process out. I think eventually over time as they add more resources to it it'll be good. But for a lot of like medium sized businesses this will check a lot of boxes for you. [00:36:10] Speaker A: I mean doesn't Azure have very similar technology and some of their regional testing? I know this is one of their big claim to fames if you will. [00:36:19] Speaker C: Yeah, I know they released a bunch of stuff earlier this year we were talking about A little bit before, I meant to try to do some live research beforehand. So I know they added a bunch of testing in there, but I don't think it did this like every 30 minute kind of synchronizing and testing. It was kind of specific services. It did, I thought. And this feels like more like a summation of a bunch of services altogether. [00:36:44] Speaker A: Got it. Okay. I mean, I like the idea of having, you know, easier. Dr. There's a bunch of SaaS vendors who make software to help make, you know, applications, which is a lot easier on AWS or any cloud provider, honestly. But if you're primarily in this ecosystem, I could see this being something very valuable. I, you know, the costing for $70 per month for plans for up to 100 execution blocks. I mean, I don't know about you, but my apps are bigger than that in many cases, you know, in the enterprise space. So that could get pretty expensive pretty quickly. But I do expect they have probably volume discounts and things you can get. If you're that massive and you want to use it, you become their biggest customer and they'll love you forever. [00:37:23] Speaker C: You'd be a great QA tester for them, I'm sure. [00:37:26] Speaker A: Yeah. AWS Lambda is getting a little bit more thick from the serverless side with Response streaming now supporting 200 megabyte response payloads, which is a 10x increase from the previous 20 megabyte limit, nailing direct processing of larger data sets without compression or S3 intermediary steps. This enhancement targets latency sensitive applications like real time AI chat interfaces and mobile apps, where time to first byte directly impacts user experience and engagement metrics. The expanded payload capacity opens new use cases including streaming image, heavy PDFs, music files, and real time processing of larger data sets directly through the Lambda function. Response Streaming is available on Node js, managed runtimes and custom runtimes across all AWS regions where the feature is supported with the 200 megabit limit now set as the default. This update reduces architectural complexity by limiting workarounds previously required for payloads exceeding 20 megabytes, potentially lowering costs associated with S3 storage and data transfer fees. [00:38:22] Speaker C: They've solved the problem here of you know, they mentioned with the S3 intermediary and in theory hopefully it would just simplify your workflow. But again, Lambda at 200, especially if you're processing stuff if it's zipped or already compressed, you're going to run into limits. Your other limits on the Lambda, you know, associated with like the temp storage size and things like that. So I would use this with carefulness to make sure that you're not like gonna blow stuff up on the other side for sure. [00:38:55] Speaker A: Moving on to gcp, Gemini CLI is now supporting custom slash commands through TOML files and Model context protocol prompts, allowing developers to create reusable prompts for common workflows like code reviews or planning tasks. This brings GitHub copilot style command functionality to Google's AI assistant in the terminal. Directly commands can be scoped at user level, available across all projects or project level, checked into GIT repos with name spacing support throughout the directory structure. The implementation uses minimal configuration requirements, just like a prompt field, making it accessible for quick adoption. The MCP integration enables Gemini CLI to automatically expose prompts from configured MCP servers as slash commands, supporting both named and positional arguments, and this positions Google to leverage the growing ecosystem of MCP compatible tools and services. Key use cases include automating code reviews, generating implementations, implementation plans, and standardizing team workflows through standard common command libraries. The Shell command execution feature allows integration with existing CLI tools and scripts, and while this is a developer productivity tool rather than a cloud service, it strengthens the Google developer ecosystem. Play against GitHub, Copilot and Amazon Q Developer. The feature is available now. The simple NPM update requiring only a Gemini API key to get started. [00:40:10] Speaker C: I mean they're copying, you know, the cloud code, you know, and hopefully it becomes a nice way to interact with it. I still like the VS code plugin and making it kind of interact more that way. I find that a little bit better from the little bit I played with cloud code. But recently I've been talking to people who say cloud code's gotten better than since the initial release. So I have to go back and kind of when I find some free time in my day, which doesn't really exist to go play with this and see if the slash commands are all what they say they are now. [00:40:46] Speaker A: Yeah, definitely worth checking out. I played the Gemini CLI when it first came out, the first couple days and then I haven't honestly gone back to it. Mostly I've been busy with other projects and things to do. But yeah, these are interesting ideas but I'm curious to see how they work. Agent to Agent protocol is getting an upgrade today. Google releases the protocol version point 3 with GRPC support, security card signing and Python SDK improvements. Positioning as an open standard for multi agent AI systems that can communicate across different platforms and vendors. The protocol now has native support in Google's Agent Development Kit ADK and offers three deployment paths, managed agent engine, serverless cloud runs or full control with GKE giving developers flexibility in how they scale their agent systems. Over 150 organizations including Adobe ServiceNow and Twilio are adopting A2A with real invitations like Tyson Foods and Gordon Foodservice using collaborative agents to share supply chain data and reduce friction in their operations. Google is launching an AI Agent marketplace where partners can sell agent to agent enabled agents directly to customers. While agent space provides a governed environment for users to access these agents with enterprise security controls. The protocol was contributed to the linux foundation in June 2024 making it a vendor neutral standard that could become the HTTP of Agent to Agent comms. [00:42:09] Speaker C: I mean I feel like Agent to Agent and really getting that going is going to be and really getting to revolutionize the space. So you know, any of these upgrades hopefully should be good to help with that. And I do like that. You know I forgot about this actually that it was the open source standard that they did open source it. So it'd be nice to see as that grows, you know what else they add to it over time and how these Agents to agent communication helps to continue to improve things. [00:42:39] Speaker A: Yeah, Agent to agents basically how you make MCP to MCP work in the cloud because one of the things you run into MCP very quickly is oh, this exists on my local box and that isn't as helpful. So definitely having that ability is, is definitely a good thing. All right. Announcing the C4 based VM on the Intel Xeon 6 processor. Granite Rapids with up to 30% better general compute performance and 60% improvement for ML recommendation workloads compared to previous generations, making them the first major cloud provider to offer the Xeon 6. The C4 shapes include titanium local SSD variants delivering 7.2 million max read IOPS and 30% lower across access latency targeting high performance databases, big data processing and media rendering workloads. C4 Bare metal instances provide direct CPU memory access for commercial hypervisors and SAP workloads, achieving 132,600 ASAPs the highest of any comparable machine with 35% performance improvement over the C3 bare metal. I don't know what an ASAP is, but sounds fancy. [00:43:44] Speaker C: I was just about to ask you if you knew what that was. [00:43:46] Speaker A: The expanded C4 series maintains existing cut discounts and integrations with managing instance groups and GK custom compute classes. Available in 19 zones with shapes ranging from 4 to 288 VCPUs and the key use cases include AI inference with FP16 trained models using the Intel AMX FP16 financial services companies requiring microsecond labeled latency improvements and visual effects rendering with reported 50% speed ups over in 2D instances. For the visual effects work world, I'm. [00:44:15] Speaker C: Still stuck on what ASAP stand for and all I can find is asap. Thank you Google for not understanding what I'm trying to have you do. Accelerated SAP is a structured methodology in developing SAP to guide and accelerate the implementation of SAP systems. [00:44:35] Speaker A: Lovely Google is launching Cloud Hub Optimization and Cost Explorer into Public Preview. It was announced at Google Next in Private Preview. The public preview now provides application centric cost visibility across multiple projects without additional charges, addressing the challenge of tracking expenses for applications that span dozens of GCP projects. The tool integrates cloud building cost data with cloud monitoring utilization metrics to serve as underutilized resources like GKA clusters with idle GPUs showing average VCPU utilization at the project level to identify optimization candidates. Unlike traditional cost dashboards that show aggregate compute engine costs, Cost Explorer breaks down spending by specific products including GK clusters, persistent disks and cloud load balancing for more granular cost attributions. Built on App Hub Application framework, the solution reorganizes cloud resources around applications rather than projects competing with AWS Cost Explorer and Azure Cost Management by focusing on application level cost optimizations. Major League Baseball's principal cloud architect reports the tool helps monitor costs across tens of business units and hundreds of developers. The particular value of organization shifting left on cloud cost management and if you'd ever had used the Google Cloud Hub Optimization and Cost Explorer previously, you know they're hot garbage. So this was a very well appreciated announcement at Google Next that they were finally going to make a major upgrade to it. I've played with it. We are trying to get into the Public Preview, which I don't know if I'll be able to talk about publicly until it goes ga, but we are very interested in this for our day job as well. [00:46:03] Speaker C: I like how all these things are adding the GPU metrics and GKE idle to kind of really show that as you are leveraging these as a company, it's important to also handle the finops side of it so that you're not burning cash faster than you thought you were. So it's interesting how to see that slowly grow over time. I mean the aws Sorry, the AWS and Azure Cost Management both have a lot to to give. I mean, I don't really see it a lot at the application level optimizations for either of them. I mean it's been a been a couple years. I really dove into AWS costs for But Azure to me doesn't really get you down to the application level without a bunch of manual work. [00:46:47] Speaker A: Yeah, I mean Amazon's sort of similar. You have to you had to tag things a certain way to get that visibility, but you do get it after that, which is helpful. [00:46:56] Speaker C: Yeah, well, out of the box would be nice, but probably don't have enough knowledge I guess. [00:47:02] Speaker A: Could be. Let's move on to your best friend Microsoft, who's got a bunch of announcements for us this week. You don't want to defend yourself for being a Microsoft fanboy? [00:47:14] Speaker C: No, I'm good. [00:47:15] Speaker A: Okay. [00:47:15] Speaker C: I'm good, I'm good. Definitely not a fanboy, but take what you can get at times. [00:47:20] Speaker A: Yeah, they're introducing Microsoft Sentinel Data Lake. It's entering Public Preview as a fully managed security data lake built directly into Sentinel, allowing organizations to store all security data in one place with cost effective long term retention while eliminating the need to build custom data architectures. The Service integrates with 350 plus existing sentinel connectors including Microsoft 365, Defender, Azure, AWS and GCP sources, storing data in open formats that support both Kusto queries and Python notebooks through a new Visual Studio code extension for advanced data analytics. Because everything goes into Visual Studio code pricing separates data ingestion storage from analytical consumption, enabling customers to store high volume low fidelity logs like network traffic cost effectively in the Data Lake tier, automatically mirroring critical analytics tier data to the lake at no extra charge. The key differ from AWS Security Lake is the native integration with Microsoft Security Ecosystem and Managed Compute environment. Duh. Target use case includes forensic analysis, compliance reporting and tracking slow attacks over extended time frames and running ML based anomaly detection on historical data with results easily promoted back to the analytics tier for investigation. [00:48:28] Speaker C: What I do like about this is the multi streams into the data lake. You know so many of these things, you know, like they said vpc VNET flow logs are so chatty but you have to have them and God forbid a security incident. But on the flip side, so many contracts say that you have to have these things so but throwing those into real time log analysis, CloudWatch log analytics, anything else like that immediately blows up your bill, you know. So having that ability to kind of take that in in multiple paths is kind of critical. Otherwise it's not effective then being able to kind of retroactively look at that. I mean AWS has had it for years, hey, throw your VPC flow logs into S3. But pulling them out required, you know, Athena magic. She'll leave with that because even following their examples half the time I swear I never could get it to work without many swearings at it. The negative side of all this is you got to use Kusto queries. So if you're already in the AWS or sorry in the Azure landscape and you are using log analytics and other things in Kusto, you know, but if you're not and you're just a security person, it's another queer language kind of learn. [00:49:42] Speaker A: I feel like I don't know what Kusto is. I know what a bunch of other ones are. But what could you maybe fill me in on that one? [00:49:49] Speaker C: So Kusto is their proprietary time series database. So all of Azure metrics, everything else and you can actually pay for the service and leverage it yourself as Azure Data Explorer. It's literally cusco under the hood. Um, so then from there it's like writing your own CloudWatch log queries, you know, and knowing the formats and everything else like that. And it's fast, you can get tons of data back but you really have to kind of know what you're doing. And I have a few people that I can do magical things with it, I swear. But it's you know, like, like most things it's SQL. Like you know, you still have your, your starting points but like you'll start with Trace and then you'll pipe that into a bunch of wears and you can do joins and these crazy things like that got is powerful though. Like it is a pretty, it's, it's a good piece of technology. [00:50:42] Speaker A: Well Microsoft saw Google's new big server and said we want a big new server too. And so they announced the E128 and E1092 VM size up to 992 VCPUs and 1.8 terabytes of RAM targeting enterprise workloads like of course SAP, HANA, large SQL databases and in memory analytical systems. These new sizes use Intel's 5th gen Xeon Platinum processors and deliver 30% better performance than the previous EV5 series which there's never been the same. Like I thought the naming conventions were the similar between the different classes but that the old. This is the replacement for the EV5. This is an E128 or E192 like. [00:51:20] Speaker C: That'S just confusing their entire sizing though we were talking about in the pre show read too of like all the clouds starting to require decoder ring. So like the first letter is family, then there's sub family and then there's features in it. [00:51:36] Speaker A: So aka like, you know, C5GN. Yeah, yeah. [00:51:40] Speaker C: So like, then you have like if there's local data and then if it supports premium or not storage is another. And then you have the version at the end. So like if you take your decoder ring and read all these, it makes sense. But you know, every time I'm like, oh, crap, okay, E is memory. I think I could be wrong. Now I'm actually gonna double check. Yeah, so like E is memory. So that's what the E is. And then you kind of gotta work your way down to the rest letter. So D, I think means there's data. [00:52:07] Speaker A: It's sort of funny because it's the E128 or E192 with up to 192 VCPUs, which means the. The 192 is actually the number of CPUs, not the amount of memory, but it's the E class, which is memory. [00:52:20] Speaker C: Yeah, okay. Yeah, it's always fun. And then good luck getting capacity for any of these. [00:52:25] Speaker A: Oh yeah. [00:52:26] Speaker C: Trying to get capacity for any of the modern ones. [00:52:29] Speaker A: And they say they're available in 14 regions, including East, US, West, Europe and Japan. East. [00:52:35] Speaker C: So what I've learned as part of my day job is that yes, they are available there, but it might only be in one of those zones. [00:52:41] Speaker A: Three availability zones. Yeah, I have. Same thing happens in GCP too. [00:52:45] Speaker C: Which means, cool, you can do it, but then you can't actually have ha. And maybe for SAP, Hana, you don't need ha because. [00:52:51] Speaker A: No, you do. You want HA in your ERP system. [00:52:56] Speaker C: Right? So like, okay, cool. So it's all in one zone. Maybe you're like, it doesn't work and it's always problematic. So like, even working with Microsoft directly and saying, hey, you know, I had this conversation with my account reps, hey, what do you recommend we use? Here's all the regions we're in. Because my products in 14 regions, you know. So we had to kind of looked at it. It's like, okay, here, use this. Here, use this. Then when we went to request the quotas, it was like, well, you can't actually get that. You can only get in two of three zones. And I was like, that's not really going to work for us because that means I have to have everything at larger scales in order to handle a single zonal outage. I don't have a third. It's a 50% I gotcha. [00:53:42] Speaker A: Okay. A couple other things with these to note. They do support the Intel Total Memory encryption, so if your data needs to be protected, you're worried about VM noisy neighbors stealing your data from memory, which there have been some attacks. This is isolated and won't be impacted by that. And then also these have the Azure boost technology providing 400,000 IOPS and 12 gigs of network throughput per second. So that's not too bad of a box if you need the size and the space. [00:54:12] Speaker C: So much data, so much data going through there so quickly. [00:54:17] Speaker A: Azure is announcing a flexible, predictable billing model for the Azure SRE Agent. The SRE Agent is a pre built AI tool for root cause analysis and incident responses that use machine learning to analyze logs and metrics, helping site reliability engineers focus on on higher value tasks while reducing operational costs and improving uptime. The billing model introduces Azure Agent units because of course we need another way to measure them as a standardized metric across all Azure agents with a fixed baseline cost of 4 AAU per hour at $0.40 for that hour for continuous monitoring plus 0.25 AAU per second for active incident response tasks. As part of Microsoft's AgentIC DevOps strategy, SRE agent represents a shift towards AI native cloud operations where intelligent agents handle routine tasks automatically, competing with AWS DevOps Guru and Google Cloud's Operation Suite. The dual flow architecture keeps the agents always learning from normal behavior patterns while reading, ready to activate AI components instantly when anomalies are detected, providing 24.7intelligent monitoring without manual interventions. Target customers include organizations managing complex cloud workloads who want predictable operational costs and the usage based pricing means you only pay for active incident response time before the baseline monitoring fee. [00:55:29] Speaker C: I really want to play with this. A little terrified of what the cost. [00:55:32] Speaker A: Is going to be. Yeah, I don't know how to calculate those costs. [00:55:34] Speaker C: Yeah, I mean even with all the fin everything with finops everyone has their own compute units now and it's a stra out we which we already talked about so I'll get off that high horse. I do want to join the weight list and kind of play with this because I think it could be pretty cool, you know, given that, you know I run a SaaS which is 24. 7 and kind of having that data and having that help our team and to me this is like augmenting the team would be pretty cool to have out there. I don't know if the cost is going to be worth it once you kind of figure out all those pieces, but it's a cool feature as long as its first response isn't, hey, scale up everything 12 fold. Which is what I assume it might do. [00:56:17] Speaker A: I mean, I like the idea of having these DevOps agents as well. I mean, in general, if you does the SRE agent just restart IIS for you because your app has a memory leak? I mean, does it start band aiding a bunch of bad things that you're doing in your app, or is it actually helpful in fixing the problem? I guess that's my one fear of some of these very sre operationally heavy driven AI agents I've seen is that they're very fix a band aid, not actually fix a root. [00:56:49] Speaker C: Yeah, and but if it stops you from getting paged at 2am on a. [00:56:53] Speaker A: Saturday night, I mean, I appreciate that. [00:56:54] Speaker C: Might be worth the fixed amendment, but. [00:56:56] Speaker A: Again, like that's all it's ever doing. [00:56:59] Speaker C: Yeah, then it might not be worth it. [00:57:01] Speaker A: Yeah. Well, Azure is finally giving us live resize feature for Premium SSD v2 and Ultra NVMe disks which enable storage capacity expansion without downtime, addressing a common pain point where disk resizing tradition requires VM restarts and application disruption. It does. What hasn't Amazon had this forever and you're telling me those fancy Ultra NVME disks I paid a fortune for didn't have this? Okay, I was hoping maybe they were better than AWS and that they took so long to release this because they also allow you to shrink the VM disk size and that is not the case. So this is only to grow, just like Amazon has had forever and GCP persistent disk resizing has had for quite a while. This supports cost optimizations by allowing you to provision smaller disks like we've been doing for decades within provisioning and scale up only when needed and target use cases include production databases, real time analytics platforms, and high transaction applications where both performance density and zero downtime operations are critical requirements. I just mad this didn't exist until today. [00:58:03] Speaker C: I I actually asked when I first started here why some of our stuff was so big and they were like well we can't grow it. I was like what do you mean you can't grow this thing? Like, well you got to start here, that's all you can have. And I was like hold on. Like you said, there are some features on Azure where I'm dumbfounded. They just don't exist yet. Like this and potentially the next conversation we're going to have. And it's amazing that it took so long to get there. [00:58:35] Speaker A: Yeah, I mean years late at this. [00:58:38] Speaker C: Point, I mean I remember I wrote a script because they wanted to thin provision. Alma was one of my clients at one point wanted to thin provision something and essentially what we did was we rated the disk at the Windows level and like we wrote a whole script to do it and then like in true Microsoft fashion about or in AWS fashion like a year later you could then scale up disk but that was like 10 years ago and then I realized how long I've been on the cloud and we're going to bypass that conversation but like moving one of those things like why was this not here guys? Yeah. [00:59:10] Speaker A: Wow, that's crazy. Well if you were excited about disk growth, you also get Azure Backup now supporting agentless multi disk crash consistent backups for VMs and general availability. The need to install backup agents or extensions on virtual machines while maintaining data consistency across multi multiple disks. This feature addresses a common pain point for enterprises running multi disk applications like databases or crash consistency across all disk is critical for successful recovery. Competing directly with snapshots From EBS and GCP's persistent disk snapshots a approach reduces VM overhead and reduces simplified backup management by leveraging Azure's infrastructure level capabilities rather than guest OS agents. Now I'll tell you that if you are running this on SQL Server or Oracle, things like asset compliance are very very important and you need to probably really test the crap out of this because my experience has been that if you are not qsing the data to the disk, it doesn't matter if you snapshot it all of the partitions together, you still are going to have a bad time. So make sure that you have some testing here. Now I think you can get away with Agentless because built into Windows you have shadow disk copies and things like that. So it's probably a little bit better than some of the Linux implementations of this feature. But again your experience will vary and I highly recommend you test this under a high load situation. So get SQL load run against the database and test this and see if you can recover cleanly without doing major data recovery or validations. [01:00:38] Speaker C: Yeah, this also qualifies as what do you mean this just wasn't here category. [01:00:43] Speaker A: I mean I already covered that they're way behind so yeah, I mean Amazon and GCP had this forever ago so. [01:00:49] Speaker C: Yeah, I mean like you said test it, make sure that it is qusing all the data you I've been burned by this before. We set up something with I think it was like postgres back in the day on AWS before it was in rds, but test it, make sure it works because you're going to have a bad day with your backups otherwise. [01:01:08] Speaker A: I mean in general like idea of striping disks and things like that that make this problematic are also kind of terrifying to me. Like you know, to have multi disk operations that require right order consistency in the cloud. I mean now with things like Azure San, I would hope that you could just get an ISCSI attachment and not have to do this craziness. But I mean these things will happen to people who didn't have that option before. So it's a nice maybe future option for you. [01:01:34] Speaker C: I mean there are things that you just can't do on the managed services too. So like they're talking about SQL. You can't run ssrs. I'm on Azure SQL, which still dumbfounds me. You know you have to run a managed instance in order to run ssrs. [01:01:49] Speaker A: You can't run clr, you can't run in memory databases. There's all kinds of limitations of things you can't do. [01:01:54] Speaker C: So you know, there are use cases for this. And don't get me wrong, I would like to think that there's always a managed service that will get all that soil away, but there are times that you need these things. [01:02:05] Speaker A: Well, in our final story for tonight, Matt DigitalOcean is back in our other cloud section. They're consolidating its AI offerings under a new unified platform called Gradient, combining GPU infrastructure, agent development tools and pre built AI applications into a single integrated experience for developers. Platform includes the infrastructure gpu, compute and inference, the platform which are those tools and then applications or pre built agents for common use cases. DigitalOcean is expanding the GPU options with the AMD Instinct Mi 325X available this week along with Nvidia H2 hundreds coming next month, providing more choice and flexibility for different AI workload requirements using Digital Ocean. AI users won't need to change anything as all current projects and APIs will continue to work with the rebrand focusing on improving organization and documentation implementation in general. If you are not a huge AI use case where you need those $10 billion in capital investment worth of compute capacity, Digital Ocean might be a great choice for you for some of your smaller workloads. Highly recommend them. I'm a fan of the underdog that is known as DigitalOcean even though I stopped using them for my own personal projects a long time ago because I needed more Amazon playing. But if I didn't do the podcast and I didn't care about all my listeners. And knowing what those features do, I would move right back Digital Ocean in a heartbeat. Because I do love their support and service and highly recommend. [01:03:24] Speaker C: Yeah, I mean, you're a small, medium sized business, you don't need a lot. They're a great option. They're, you know, like you've said, their support's good. I played with them in the past. I've actually done a few migrations off Digital Ocean. Sorry, guys. But they definitely help every step along the way, so. And if you don't need a massive thing or you're already there, it's an easy way to kind of step into your AI application building endeavors. [01:03:50] Speaker A: Indeed. Well, Matt, we've reached another end of the podcast. It's had to go. [01:03:57] Speaker C: We made it with two of us again. [01:03:58] Speaker A: Yeah. And we cut 20 minutes off this one so we did much better at the editing cropping. [01:04:04] Speaker C: It is a lot harder because I'm always like, ooh, this is kind of interesting. We can talk about that. [01:04:08] Speaker A: But yeah, there's been a bunch of really good stories that again, we're trying to give you guys a really good high value level content, but sometimes it's just a bit fun to make some of these announce. Some of the Azure ones we just talked about, I was like, really? Like we gotta talk about that. [01:04:21] Speaker C: I gotta make fun of them a little bit. [01:04:23] Speaker A: I'm sorry. Exactly. So anyways, I will talk to you next week here in the Cloud, Matt, and hopefully we'll be joined by our co hosts as well. [01:04:33] Speaker C: See you then later. Bye. [01:04:38] Speaker B: And that's all for this week in Cloud. We'd like to thank our sponsor, Archera. Be sure to click the link in our show notes to learn more about their services. While you're at it, head over to our [email protected] where you can subscribe to our newsletter, join our Slack community, send us your feedback and ask any questions you might have. Thanks for listening and we'll catch you on the next episode.

Show Notes

Titles we almost went with this week:

General News

AI Is Going Great – or How ML Makes Its Money

AWS

GCP

Azure

Other Clouds

Closing

Chapters

Episode Transcript

Other Episodes

Episode 168

168: The Cloud Pod Celebrates GCP Madrid Region With Sangria

Episode 201

201: The CloudPod is assimilated and joins the Azure Collective

Episode 173

173: Oracle Begins Its Invasion of Sovereign Nations