344: Amazon’s Coding Bot Bites the Hand That Runs It

Welcome to episode 344 of The Cloud Pod, where the forecast is always cloudy! Justin is out of the office at a World of Warcraft Tournament (not really), and Ryan is pursuing his lifelong dream of becoming a roadie for The Eagles (maybe?), so it’s Jonathan and Matt holding down the fort this week, and they’ve got a ton of cloud news for you! From security to AI assistants, we’ve got all the news you need. Let’s get started!

Titles we almost went with this week

Zero Bus, All Gas, No Kafka Brakes
AI Coding Bot Bites the Hand That Runs It
When Your Robot Developer Goes Rogue on AWS
Kubernetes VPA Finally Stops Evicting Your Database Pods
Google Trains 100 Million People, Still No One Reads the Docs
MCP Walks Into a Bar Not Enterprise Ready Yet
No More Pod Evictions Kubernetes 1.35 Scales In Place
No Keys No Drama Just IAM and Cloud SQL
One Agent to Rule Them All in Kubernetes
IAM Tired of Writing Policies Manually
When Your AI Coding Tool Has Delete Permissions
One Dashboard to Rule All Your GPU Clusters
Serverless Reservations Prove Nothing Is Truly Free Range
Kiro Takes the Wheel on AWS IAM Policies
Stop Blaming Backups for Your Bad Architecture
AI Agent Goes Rogue, Takes AWS Down With It
Everything is Bigger in Texas Except the Water Usage
OpenAI launches the college basketball of Inference. Pro service – low cost

General News

1:05 Code Mode: give agents an entire API in 1,000 tokens

Cloudflare‘s Code Mode MCP server reduces token consumption by 99.9% compared to a traditional MCP implementation, exposing the entire Cloudflare API (over 2,500 endpoints) through just two tools, search() and execute(), using roughly 1,000 tokens versus 1.17 million for a conventional approach.
The architecture works by having the AI agent write JavaScript code against a typed OpenAPI spec representation, rather than loading tool definitions into context, with code executing inside a sandboxed V8 isolate (Dynamic Worker) that restricts file system access, environment variables, and external fetches by default.
This approach addresses a fundamental constraint in agentic AI systems: adding more tools to give agents broader capabilities directly competes with the available context space for the task at hand.

01:41 Jonathan- “It’s good. I’m not sure I could imagine 2 ½ thousand MCP tool definitions in a context window and still actually use it for anything.”

AI Is Going Great – Or How ML Makes Money

03:58 OpenClaw creator Peter Steinberger joins OpenAI

Peter Steinberger, creator of viral AI assistant OpenClaw (formerly Clawdbot/Moltbot), has joined OpenAI to lead development of next-generation personal agents.
OpenClaw gained attention for its ability to perform real-world tasks like calendar management, flight booking, and autonomous social network participation.
OpenAI will maintain OpenClaw as an open source project through a foundation structure, allowing the community to continue development while Steinberger focuses on building similar capabilities into OpenAI’s product suite.
This acquisition-to-open-source model differs from typical tech company acquisitions, where projects are absorbed or shut down.
The move signals OpenAI’s strategic focus on agentic AI systems that can execute multi-step tasks autonomously rather than just responding to prompts. Steinberger’s experience building practical automation workflows could accelerate OpenAI’s development of agent capabilities that compete with offerings from Anthropic, Google, and Microsoft.
For developers, this represents a shift in how personal AI assistants may be deployed, moving from standalone applications to integrated agent frameworks within larger platforms.
The open source continuation of OpenClaw provides a reference implementation for building task-oriented AI systems.

04:19 Matt – “This is kind of where I see Anthriopic Cowork slowly going to, being your personal assistant, and having this be your ability to manage your real-world tasks. It’s great, and if they can build that into OpenAI, then it becomes a lot more of a personal assistant than just a general tool that you’re using.”

09:11 Making frontier cybersecurity capabilities available to defenders

Anthropic launched Claude Code Security in a limited research preview for Enterprise and Team customers, with free expedited access for open-source maintainers.
Unlike traditional static analysis tools that match known vulnerability patterns, it reasons through code contextually, the way a human security researcher would, catching logic flaws and access control issues that rule-based tools miss.
The tool uses a multi-stage verification process where Claude re-examines its own findings to filter false positives, assigns severity ratings, and provides confidence scores.
Critically, no patches are applied without human approval, keeping developers in the decision loop.
For cloud and enterprise teams, this integrates directly into Claude Code on the web, meaning security review happens within existing developer workflows rather than requiring separate tooling. The dashboard surfaces validated findings alongside suggested patches for team review.
Want to request access? You can do that here.

09:35 Preview, review, and merge with Claude Code

Claude Code on desktop now closes the full development loop by adding live app preview, inline code review, and GitHub PR monitoring in a single interface, reducing the need to switch between tools during development.
The new auto-fix and auto-merge features allow Claude to monitor PRs in the background, automatically attempt to fix CI failures, and merge PRs once all checks pass, letting developers move on to new tasks without manually tracking PR status.
The inline code review feature via the Review Code button lets Claude examine local diffs and leave comments directly in the desktop diff view before any code leaves the machine, functioning as an automated pre-push review step.
Session portability is now built in, allowing developers to start a session in the CLI using /desktop to bring context into the desktop app, or push local sessions to the web or Claude mobile app using the Continue with Claude Code on the web button.
These updates are available now to all users and represent a shift toward agentic, background-running development workflows where the AI continues working on tasks like CI remediation while the developer focuses elsewhere.

11:20 Jonathan – “It’s a very human way of going back and self-reflecting on the work that you’ve just done.”

18:08 Announcing General Availability of Zerobus Ingest, part of Lakeflow Connect

Databricks has announced General Availability of Zerobus Ingest, part of Lakeflow Connect, a serverless streaming service that pushes data directly into Delta tables without intermediate message buses like Kafka.
It supports thousands of concurrent connections and achieves over 10GB per second of aggregate throughput with data landing in under 5 seconds.
The core architectural difference is a single-sink design versus Kafka’s multi-sink approach, reducing a traditional five-system streaming stack down to two components.
This eliminates dedicated compute and storage for the message bus itself, along with the engineering overhead to manage it, at a fraction of the cost per gigabyte compared to self-managed Kafka.
Developers can integrate via gRPC, REST APIs, or language-specific SDKs, and every write is automatically governed through Unity Catalog for lineage tracking and access control.
This means streaming data gets the same governance treatment as the rest of the lakehouse from the moment it arrives.
Real-world deployments include Toyota using it to detect factory overheating conditions in minutes rather than hours, and Joby Aviation reducing aircraft telemetry resolution latency from days to minutes.
Both cases highlight manufacturing and IoT as strong use cases where low-latency ingestion has a direct operational impact.
Zerobus Ingest is now GA on AWS and Azure, with Google Cloud support coming soon, priced under the Lakeflow Jobs Serverless SKU with a 6-month promotional pricing period currently active.

20:05 Jonathan – “I’m not a fan of Kafka in general, but I am a fan of doing things at massive scale, so it’s kind of cool.”

07:27 OpenAI prepares new ChatGPT Pro Lite tier at $100 monthly

OpenAI appears to be preparing a ChatGPT Pro Lite tier at $100 per month, slotting between the existing Plus plan at $20 and the full Pro plan at $200, based on findings from engineer Tibor Blaho, who has a consistent track record of uncovering unreleased features.
The new tier would address a notable pricing gap for users who regularly hit Plus rate limits but cannot justify the full Pro cost, with freelancers, researchers, and developers as the likely target audience.
The plan may be structured around compute-heavy use cases, including Codex and persistent agentic workloads, where background-running agents carry substantially higher infrastructure costs than standard chat interactions.
OpenAI recently hired Peter Steinberger, creator of the open-source agent framework OpenClaw, and has signaled a multi-agent direction for ChatGPT, suggesting the Pro Lite tier could serve as an entry point for always-on agentic capabilities rather than just increased chat limits.
No release date or confirmed feature set exists yet, but the addition of a mid-tier option would create competitive pressure on Google, which currently lacks an equivalent individual plan at this price point.

21:56 Matt – “I just think they needed a different naming convention.”

Cloud Tools

23:11 HCP Packer adds SBOM vulnerability scanning

HCP Packer now includes SBOM vulnerability scanning in public beta, allowing platform teams to scan software bills of materials against MITRE’s CVE database and classify findings by severity directly within the artifact registry.
The feature builds on last year’s SBOM storage capabilities, which are now generally available, meaning teams can generate, store, and now actively scan SBOMs for known vulnerabilities in a single workflow.
This addresses a supply chain security gap by surfacing vulnerability data at the image level, covering AMIs, Docker containers, and virtual machines before they reach production environments.
Teams can see which specific package versions are affected and when vulnerabilities were detected, giving them the information needed to prioritize remediation without leaving the HCP Packer interface.
The feature is available in public beta at no cost through the free HCP Packer tier, making it accessible for teams looking to add CVE scanning to their image management process without additional tooling.

24:15 Jonathan – “It’s only as current as the time you built it though…”

25:43 Why Kubernetes 1.35 is a game-changer for stateful workload scaling

Kubernetes 1.35 brings two notable autoscaling milestones: In-Place Pod Resize graduating to GA and Vertical Pod Autoscaler’s InPlaceOrRecreate update mode reaching beta, allowing VPA to adjust CPU and memory on running pods without evicting them.
The practical benefit for stateful workloads is substantial.
Previously, VPA had to evict and recreate pods to apply new resource requests, which caused disruption for databases, caches, and other restart-sensitive applications. In-place resizing preserves the pod UID, container ID, and restart count throughout the adjustment.
VPA operates in three stages worth understanding: a recommendation-only mode for passive observation, an InPlaceOrRecreate mode that attempts live resizing first and falls back to eviction only when node resources are insufficient, and configurable policies using minAllowed and maxAllowed to bound what VPA can actually set.
VPA controllers are not bundled with Kubernetes itself.
Engineers need to clone the kubernetes/autoscaler repository and run the vpa-up.sh script to deploy the Recommender, Updater, and Admission Controller components alongside the mutating

26:09 Jonathan – “I think the practical benefit for stable workloads are fairly substantial, if you’re one of those crazy people who like to run databases or SQL server on Kubernetes (like Cody) because previously those pods would be evicted and new resources requested, which would obviously cause disruption, stale caches, and other issues.”

AWS

31:20 Amazon service was taken down by AI coding bot

Listener note: paywall article
Amazon’s Kiro AI coding tool caused a 13-hour outage of an AWS cost exploration service in December after engineers granted it broad permissions, and it autonomously decided to delete and recreate the environment rather than patch it.
A second outage involved Amazon Q Developer, though Amazon says neither event impacted core customer-facing AWS services.
Amazon’s official position is that both incidents were user error stemming from improper access controls, not failures of the AI tools themselves.
Kiro is designed to request authorization before acting, but the engineer involved had been granted broader permissions than intended, bypassing that safeguard.
The incidents highlight a practical risk with agentic AI tools in production environments: when an AI agent is given the same permissions as a human operator without requiring peer review, it can take destructive autonomous actions that a second set of eyes might have caught. AWS has since added mandatory peer review and staff training as corrective measures.
AWS is pushing for 80 percent of its developers to use AI coding tools at least once weekly, which means these tools are being adopted at scale internally before the risk patterns are fully understood.
Listeners running their own AI agents in production should treat permission scoping and human-in-the-loop approval gates as non-optional controls, not optional defaults.
Kiro launched in July 2025 and is positioned as a specification-driven coding assistant meant to go beyond simple vibe coding.
The December incident was limited to mainland China, and the second incident had no customer-facing impact, but the pattern of two production disruptions in a few months is worth tracking as agentic tools become more common in enterprise workflows.

33:24 Matt – “…if you’re letting the AI tool start to do things inside of production environments, that’s where you need to watch it, and you need to probably have it be a little bit more specific, so the human needs to kind of be watching what’s going on and peer reviewing it.”

35:49 Amazon pushes back on Financial Times report blaming AI coding tools for AWS outages

Amazon issued a public rebuttal to a Financial Times report claiming its Kiro AI coding tool caused multiple AWS outages, acknowledging one limited incident in December but attributing it to a misconfigured access control role rather than a flaw in the AI tool itself.
The confirmed disruption affected only AWS Cost Explorer in a single China region for roughly 13 hours, with no customer inquiries received, and did not touch core services like compute, storage, or databases.
Amazon’s core defense is that the issue was user error, not AI error, noting that a misconfigured role could result from any developer tool or manual action, AI-powered or not.
In response to the incident, AWS has added safeguards, including mandatory peer review for production access, which is a practical governance consideration for any organization deploying agentic AI tools in production environments.
The broader takeaway for AWS customers is that agentic AI tools capable of autonomous actions, like deleting and recreating environments, require clear human oversight policies and access control guardrails before being used in production systems.

37:00 AWS IAM Policy Autopilot is now available as a Kiro Power

AWS IAM Policy Autopilot, an open source static code analysis tool launched at re:Invent 2025, is now available as a Kiro Power, allowing developers to generate baseline IAM policies directly within the Kiro IDE without manual policy writing.
The integration uses a one-click installation model that removes the need for manual MCP server configuration, streamlining how developers access policy generation tools during AI-assisted development workflows.
Key use cases include rapid prototyping of AWS applications, baseline policy creation for new projects, and keeping developers in their coding environment rather than switching to the IAM console or documentation.
This fits into the broader trend of embedding security and permissions tooling earlier in the development cycle, helping teams start with least-privilege policies that can be refined over time rather than retrofitting permissions after the fact.
The tool is open source and available on GitHub at github.com/awslabs/iam-policy-autopilot, with no additional cost mentioned beyond standard Kiro and AWS service usage, making it accessible for teams already using the Kiro IDE.

38:18 Jonathan – “I’m really on the fence about this. Because on one hand, I know the pain, especially with things like deployment policies…and just trying to figure out every permission that has to be added so that Terraform can just do a deployment – it becomes very complicated. At the same time, if you have a machine that looks at your code and says ‘this is the policy you need for it,’ I don’t think that’s any security at all unless there’s another check at the end.”

-Honorable Mentions-

41:52 Amazon Redshift Serverless introduces 3-year Serverless Reservations

Amazon Redshift Serverless now offers 3-year Serverless Reservations, providing up to 45% cost savings compared to standard on-demand RPU pricing while maintaining the serverless model’s flexibility.
The reservations are managed at the AWS payer account level and can be shared across multiple AWS accounts, making this useful for organizations running Redshift Serverless workloads across linked accounts.
-stop
Billing runs 24/7 on an hourly basis, metered per second, meaning you pay for reserved RPUs continuously, regardless of actual usage, so this option makes most sense for consistently active workloads rather than sporadic ones.
Any RPU consumption beyond the reserved amount falls back to standard on-demand rates, so customers need to size their reservations carefully to avoid negating the savings.
Reservations can be purchased through the Redshift console or via the create-reservation API and are available in all regions where Redshift Serverless is currently supported.
More information is available on the Amazon Redshift Management Guide, which you can find here.

42:03 Amazon Says It Will Spend $12 billion On Louisiana Data Centers

Amazon has announced a $12 billion investment in data center campuses in Louisiana, aimed at expanding infrastructure capacity for AI and cloud computing workloads.
A notable aspect of the deal is Amazon’s commitment to covering its own power costs directly, working with regional utility Southwestern Electric Power Company to avoid passing energy expenses onto local consumers.
Amazon is pairing the infrastructure investment with solar energy projects in Louisiana, which aligns with its broader sustainability commitments and addresses concerns about grid strain from large-scale data center operations.
This announcement reflects a broader industry trend where cloud providers are proactively addressing public and political concerns about data center energy consumption, following a similar commitment from Microsoft last month regarding higher electricity rate payments.
For AWS customers, this expansion signals continued investment in US-based infrastructure capacity, which could translate to improved regional availability and lower latency for workloads in the southern United States over time.

42:18 Announcing AWS Elemental Inference

AWS Elemental Inference is a fully managed AI service that automatically generates vertical video crops and highlight clips from live and on-demand broadcasts in parallel with encoding, targeting broadcasters who need to distribute content across TikTok, Instagram Reels, YouTube Shorts, and similar platforms without dedicated production staff.
The service uses an agentic AI approach with no prompts or human-in-the-loop intervention required, handling both vertical video cropping and metadata-based highlight detection automatically, which reduces the manual workflow overhead typically associated with multi-platform content distribution.
Beta testing with large media companies showed 34% or more cost savings on AI-powered live video workflows compared to using multiple point solutions, making this a notable consolidation option for media organizations already using AWS Elemental encoding services.
A practical sports broadcasting use case is highlighted where highlight clips can be identified and distributed to social platforms during live games rather than hours after the fact, addressing a real operational gap in live content workflows.
The service is available in four regions at launch: US East N. Virginia, US West Oregon, Asia Pacific Mumbai, and Europe Ireland.
Pricing details are not specified in the announcement, so listeners should check the AWS Elemental Inference documentation at docs.aws.amazon.com/elemental-inference for current pricing information.

GCP

57:25 Managed MCP servers for Google Cloud databases

Google Cloud expanded its managed MCP server support to cover AlloyDB, Spanner, Cloud SQL, Bigtable, and Firestore, allowing AI agents to interact with these databases through natural language without requiring infrastructure deployment or complex configuration.
The security model relies entirely on IAM for authentication rather than shared keys, and all agent actions are logged in Cloud Audit Logs, which addresses a practical concern for teams worried about giving AI agents access to production databases.
A new Developer Knowledge MCP server connects IDEs directly to Google’s official documentation, letting agents reference best practices in real time during tasks like database migrations or app development troubleshooting.
Because these servers follow the open MCP standard, they work with third-party clients like Anthropic’s Claude in addition to Gemini, which broadens the practical appeal beyond teams already committed to Google’s AI tooling.
Google has signaled plans to extend managed MCP support to Looker, Memorystore, Pub/Sub, Kafka, and migration services in the coming months, suggesting this is an ongoing buildout rather than a one-time release.
Pricing is not separately listed for MCP access and likely falls under existing database service costs.

44:12 Matt – “Anything that makes databases easier, I’m all for.”

45:12 Gemini 3.1 Pro: Announcing our latest Gemini AI model

Gemini 3.1 Pro is now available in preview for developers via Google AI Studio, Gemini CLI, Vertex AI, and Android Studio, with enterprise access through Vertex AI and Gemini Enterprise. Pricing details have not been publicly announced for the preview period.
The model scores 77.1% on the ARC-AGI-2 benchmark, which tests reasoning on novel logic patterns, representing more than double the score of the previous Gemini 3 Pro model.
This positions it as a stronger option for complex problem-solving tasks compared to its predecessor.
Practical use cases highlighted include generating animated SVGs from text prompts, building live data dashboards by connecting to public APIs, and prototyping interactive 3D interfaces with hand-tracking and generative audio. These examples suggest the model is particularly suited for developers working on data visualization and creative coding projects.
Consumer access is rolling out through the Gemini app and NotebookLM, but the 3.1 Pro tier is restricted to Google AI Pro and Ultra plan subscribers. This tiered access model means free-tier users will not have access during the preview phase.
Google notes the model is still in preview while they validate performance for agentic workflows before a general availability release. GCP customers evaluating it for production use should factor in that capabilities and pricing may shift before the full release.

46:23 Matt – “It’s just amazing to me how fast these models are improving. This one is saying it scored a 77%, where models a year ago where 40 and 50%. Seeing how fast everything is moving is insane.”

47:36 Understanding the Firefly clock synchronization protocol

Google’s Firefly is a software-based clock synchronization protocol that achieves sub-10-nanosecond NIC-to-NIC synchronization across data center hardware, without requiring specialized or expensive dedicated timing equipment.
The protocol uses a distributed consensus algorithm built on random graphs rather than a traditional hierarchical time server model, which improves convergence speed, scalability, and resilience to network path asymmetries.
Firefly decouples internal synchronization from external UTC synchronization, meaning external time server jitter does not degrade the precision of clock alignment within the data center fabric itself.
Financial services workloads are a primary beneficiary, as regulatory requirements mandate sub-100 microsecond external UTC synchronization and sub-10 nanosecond internal synchronization, both of which Firefly meets on standard cloud infrastructure.
Beyond finance, the protocol has practical implications for distributed database consistency, ML workload coordination, and fine-grained network telemetry, potentially enabling workloads that previously required on-premises dedicated hardware to run on cloud infrastructure instead. No specific pricing details were provided in the announcement.

48:52 Jonathan – “The fact that you need to guarantee sub-hundred microsynchronization for financial systems is crazy.”

-Honorable Mentions-

50:32 America-India Connect infrastructure connects four continents

Google is investing $15 billion in AI infrastructure in India and launching America-India Connect, a multi-continent subsea cable initiative that establishes new fiber-optic routes connecting the United States, India, Singapore, South Africa, and Australia.
The project creates Visakhapatnam as a new international subsea gateway on India’s east coast, adding network diversity beyond existing Mumbai and Chennai landing points.
The infrastructure combines multiple subsea cable systems, including Equiano, Nuvem, Bosun, Tabua, TalayLink, and Honomoana, to create redundant high-capacity routes between American coasts and India through both African and Pacific paths.
This approach provides network resilience for over 1 billion people in India while improving connectivity across the Southern Hemisphere.
Google Cloud is serving as the primary cloud infrastructure provider for India’s iGOT Karmayogi platform, which delivers training to over 20 million public servants across 800+ districts.
The platform will use AI to digitize legacy training content and enable access in 18+ Indian languages, supporting the government’s Mission Karmayogi initiative for civil service modernization.
The announcement positions these subsea cables as critical infrastructure to prevent an AI divide, with documented evidence that subsea cable connectivity improves internet affordability and reliability while driving productivity and economic growth.
The initiative builds on Google’s existing infrastructure investments in Africa, Australia, and the Pacific region.
Added this one just for you, Justin.

52:20 Wilbarger County data center

Google is building a new data center in Wilbarger County, Texas, expanding its existing infrastructure footprint in the state.
This is primarily an infrastructure capacity announcement rather than a new GCP service or feature.
The facility will use air-cooling technology instead of traditional water cooling, limiting water consumption to only essential campus operations like kitchens. This is a notable operational choice given ongoing concerns about data center water usage in drought-prone regions.
Google has contracted to add more than 7,800 MW of net-new energy generation and capacity to the Texas electricity grid, with the Wilbarger facility co-located alongside new clean power developed in partnership with AES.
Google announced a $30 million Energy Impact Fund in November to support energy affordability, school weatherization, and energy workforce development across Texas. Details on the fund are available here.
For GCP customers, additional Texas-based infrastructure generally signals potential improvements in latency and redundancy for workloads serving the south-central US region, though Google has not announced specific new GCP regions or zones tied to this facility.

52:55 Use Lyria 3 to create music tracks in the Gemini app

Google DeepMind’s Lyria 3 model is now available in beta within the Gemini app, letting users generate 30-second music tracks with lyrics, custom cover art, and style controls from text prompts or uploaded photos and videos.
This is available to users 18 and older in 8 languages, with higher usage limits for Google AI Plus, Pro, and Ultra subscribers.
Lyria 3 improves on previous versions by auto-generating lyrics from prompts, offering more control over style, vocals, and tempo, and producing more musically complex outputs without requiring users to provide their own creative assets.
All generated tracks are embedded with SynthID, Google DeepMind’s imperceptible watermark, and the Gemini app now extends its AI content verification to audio files, allowing users to upload audio and check whether it was generated by Google AI.
The feature is also rolling out to YouTube creators via Dream Track for Shorts soundtracks, connecting Lyria 3 to a broader content creation workflow beyond the Gemini app itself.
On the responsible AI side, Google states Lyria 3 was trained with copyright and partner agreements in mind, artist-specific prompts are treated as stylistic inspiration rather than direct mimicry, and output filters check against existing content, though Google acknowledges this approach is not guaranteed to catch all issues.

Azure

57:25 A milestone achievement in our journey to carbon negative

Microsoft has achieved its 2025 goal of matching 100 percent of global electricity consumption with renewable energy, contracting 40 gigawatts of new renewable capacity across 26 countries since 2020.
This represents enough energy to power approximately 10 million US homes, with 19 GW currently online and the remainder coming online over the next five years.
The renewable energy procurement has reduced Microsoft’s reported Scope 2 carbon emissions by an estimated 25 million tons and mobilized billions in private investment through over 400 contracts with 95 utilities and developers. This directly impacts Azure datacenter operations globally, supporting the infrastructure that runs customer workloads while advancing toward the company’s 2030 carbon negative commitment.
Microsoft is expanding beyond renewable energy to include nuclear power and other carbon-free technologies, including a 50 MW fusion project with Helion in Washington state and restarting the 835 MW Crane Clean Energy Center in Pennsylvania with Constellation Energy. The Climate Innovation Fund has allocated $806 million to 67 investees, with 38 percent directed toward energy systems innovation.
The company is deploying AI-driven tools to accelerate clean energy deployment, including collaborations with Idaho National Laboratory for nuclear licensing and the Midcontinent Independent System Operator for grid optimization.
These tools aim to streamline the design, permitting, and deployment of new power technologies to expand grid capacity more efficiently.
Azure customers benefit indirectly through more sustainable cloud infrastructure, though Microsoft notes the shift to an all-of-the-above decarbonization strategy recognizes that rising electricity demand from datacenters, AI workloads, and digital services requires diverse carbon-free energy sources beyond renewables alone.

55:58 Generally Available: Quota and deployment troubleshooting tools for Azure Functions Flex Consumption

Azure Functions Flex Consumption now has generally available quota and deployment troubleshooting tools built directly into the platform, giving developers clearer visibility into quota limits and constraints without needing to dig through documentation or support tickets.
The quota troubleshooting experience surfaces Flex Consumption-specific limits in context, which is useful for teams hitting scaling walls and trying to understand why deployments are behaving unexpectedly.
This is a quality-of-life improvement aimed at developers and platform engineers who use Flex Consumption for its per-execution billing model and fast scaling, helping reduce time spent diagnosing deployment failures.
Pricing for Flex Consumption remains consumption-based, so there is no additional cost for these troubleshooting tools themselves. More details are available at the Azure updates page here.
Teams already invested in Azure Functions should note this reduces reliance on external monitoring or support escalations for common quota-related issues, keeping troubleshooting within the Azure portal workflow.

56:32 Matt – “This is a great quality of life improvement because you can see why things are breaking when you’re using flexible consumption.”

-Honorable Mentions-

1:01:07 Public Preview Announcement: Empower Real-Time Security with Microsoft Sentinel’s CCF Push Feature | Microsoft Community Hub

Microsoft Sentinel’s CCF Push feature, now in public preview, allows security data providers to send logs directly to a Sentinel workspace without the traditional setup overhead of manually configuring Data Collection Endpoints, Data Collection Rules, Entra app registrations, and RBAC assignments. Pressing Deploy handles all resource provisioning automatically.
The feature is built on Sentinel’s Log Ingestion API, which supports high-throughput data ingestion, pre-ingestion data transformation, and direct targeting of system tables, making it more flexible than the older polling-based connector model.
For partners and ISVs building Sentinel integrations, CCF Push reduces time to market by consolidating connector deployment through the Content Hub as a single interface, rather than requiring customers to configure multiple Azure resources independently.
Early adopters include security vendors like Obsidian Security and Varonis, suggesting the feature is already being validated in real-world security workflows.
Developers can reference the MS Learn documentation here to get started.
No specific pricing details were provided in the announcement, but since CCF Push feeds data into Sentinel workspaces, standard Sentinel and Log Analytics ingestion costs would apply.
Organizations evaluating this feature should factor in their existing Sentinel pricing tier when estimating costs.

1:01:24 Microsoft Sovereign Cloud adds governance, productivity and support for large AI models securely running even when completely disconnected

Azure Local disconnected operations are now generally available, allowing organizations to run mission-critical infrastructure with full Azure governance and policy enforcement even when completely isolated from cloud connectivity. This targets government, defense, and regulated industries where external dependencies are either unacceptable or prohibited.
Microsoft 365 Local disconnected brings Exchange Server, SharePoint Server, and Skype for Business Server into fully air-gapped sovereign environments running on Azure Local, with Microsoft committing support for these workloads through at least 2035.
This keeps productivity tools available under the same governance boundary as infrastructure workloads.
Foundry Local now supports large multimodal AI models running on-premises hardware, including NVIDIA GPUs, within fully disconnected sovereign environments. This extends local AI inferencing capabilities beyond the smaller models Foundry Local previously supported, with Microsoft providing deployment, update, and operational health support.
The three components together form a full-stack sovereign private cloud covering infrastructure, productivity, and AI inferencing, all manageable through consistent Azure governance tooling regardless of connectivity state.
Pricing is not publicly listed and appears to vary based on deployment scale and customer qualification, so organizations should contact Microsoft directly for specifics.
Target customers include public sector agencies, classified environments, and regulated industries in regions where data residency and operational autonomy are legal or contractual requirements.
Azure Local is disconnected, and Microsoft 365 Local is available worldwide, while large model support on Foundry Local is currently limited to qualified customers.

Emerging Clouds

1:03:04 Introducing Command Center: The unified operations platform for AI workloads

Crusoe Command Center is a unified operations platform that consolidates GPU cluster monitoring, orchestration, and support into a single interface, addressing the common problem of engineers context-switching between fragmented dashboards during AI training runs.
The platform integrates with Crusoe Managed Kubernetes and supports Managed Slurm, allowing long-running multi-week training jobs to operate continuously across large GPU clusters without manual intervention.
AutoClusters is a key component that automatically detects GPU performance degradation, evicts compromised nodes, and replaces them with healthy instances from a reserve pool, reducing the need for around-the-clock manual oversight.
On the observability side, Command Center supports multiple access methods, including a UI, Grafana via PromQL API, and a Prometheus endpoint, while a Telemetry Relay feature streams infrastructure metrics directly to external tools to reduce data silos.
The Crusoe Watch Agent, paired with Telemetry Relay, extends visibility to custom application-level metrics, allowing teams to correlate workload performance with underlying GPU health data for more precise troubleshooting.

1:04:04 Matt – “The whole stack here is what I kind of find nice. The smaller clouds are trying to attack that whole vertical a lot more, where they’re giving you that depth all the way down, so if you are training your own model, you get the CPU, you get the GPU, you can see that whole stack of what’s going on, and really start to fine-tune.”

1:05:09 Expanding our Agentic Inference Cloud: Introducing GPU Droplets Powered by AMD Instinct MI350X GPUs

DigitalOcean is adding AMD Instinct MI350X GPUs to its GPU Droplets lineup, built on the CDNA 4 architecture and optimized for inference workloads, including prefill phase compute, low-latency token generation, and larger context windows.
The platform has demonstrated measurable results with existing customers, including a 2x increase in production request throughput and 50% reduction in inference costs for Character.AI, giving potential adopters concrete performance benchmarks to evaluate.
DigitalOcean is positioning these offerings toward AI-native companies and developers who need enterprise features like HIPAA eligibility and SOC 2 compliance without the complexity of larger cloud providers, with provisioning available in a few clicks.
The GPUs are currently available in the Atlanta datacenter, with AMD Instinct MI355X GPUs planned for next quarter, which will introduce liquid-cooled rack infrastructure to support larger models and datasets.
For smaller businesses and developers, the predictable usage-based pricing and simplified deployment model represent a meaningful alternative to the more complex pricing and configuration requirements typical of hyperscaler GPU offerings.

Closing

And that is the week in the cloud! Visit our website, the home of the Cloud Pod, where you can join our newsletter, Slack team, send feedback, or ask questions at theCloudPod.net or tweet at us with the hashtag #theCloudPod

[00:00:07] Speaker A: Welcome to the Cloud pod where the forecast is always cloudy. We talk weekly about all things aws, GCP and Azure. [00:00:14] Speaker B: We are your hosts, Justin, Jonathan, Ryan and Matthew. [00:00:18] Speaker A: Episode 344Amazon's coding bot Bites the Hand that Runs It Hey Matt, quite in here today. [00:00:25] Speaker B: Hey Jonathan, how are you? [00:00:28] Speaker A: I'm good, good. Long week of project managing and various other things. How about you? [00:00:35] Speaker B: Long week just in general, life, work, personal, family, you know, everything all at once this week. So doing some work, travel, doing some personal travel, all merged together makes a very long week. But you got two of four of us, so here's our podcast. [00:00:52] Speaker A: Yep, not quite a quorum, but quorum enough. [00:00:57] Speaker B: It's better than one of us. [00:00:59] Speaker A: It is. All right, so first we got general news. Cloudflare's code mode MCP server reduces token consumption by 99.9% if you have a bunch of tools, I guess compared to a traditional MCP implementation. By exposing the entire Cloudflare API, which would be two and a half thousand endpoints through only two tools, search and execute. I think this is kind of like what Anthropic did with, with Claude. And instead of putting all the tools into a single context, they, they create skills. And so you have a search for a skill tool and then the skill drops just important pieces into the, into the chat. So it's good. I mean, not sure I could imagine two and a half thousand MCP tool definitions in a in a context window and still actually use it for anything. So the architecture works by having the AI agent write JavaScript code against the typed OpenAI spec, rather than loading all the definitions into context and the code executes inside a sandboxed worker that restricts file system access, environment variables and external fetches by default. [00:02:04] Speaker B: I mean, they would have had to do something because like you said, the context window were for 2,500 endpoints just isn't logical. So but the ability to do it in a thousand tokens and get that type of data in there is actually really impressive. So, you know, being able to achieve this hopefully sets starts to say a better model than having like 15 MCP servers, all for, you know, one company for one product. You know, if they can get into one, it then becomes, you know, here's your API endpoint, here's your MCP, here's your 8. Like you get back to a better state of being able to work with something than having to know how to set up all these things in order to get everything working for you. [00:02:45] Speaker A: Yeah, I kind of question why they would Put it all into a single MCP service in the first place. I mean they have such a diverse range of products and services. You would think that they would just break it up into some more simple. [00:02:59] Speaker B: But then it's like what MCP server are you hitting? Are you hitting their. Great, now I'm going to try to make something up on the fly. Do you hit their worker nodes, you know, and their, their edge compute that they have or you hitting their general like set me up Cloudflare or you know, and you, you start to have too many MCPs at that point you don't know where to go with anything. So having it at one place, it's more centralized. But you know, you could also break more things. [00:03:25] Speaker A: Yeah, I mean it seems like a good candidate for infrastructure as code. I mean, I guess this is kind of like getting at. They're writing JavaScript against the spec which then runs. It's. It's like having infrastructure as co except it's fed in through an agent instead. Well, unlike Cloudflare, OpenClaw is not known for its security. But OpenClaw creator Peter Steinberger has joined OpenAI. He's a creator of the viral AI assistant OpenClaw, formerly Claudebot Maltbot and has joined OpenAI to lead development of a next generation personal agents. [00:04:02] Speaker B: This is kind of where I see Anthropic cowork kind of too slowly going to being your personal assistant and whatnot and having this be your ability to kind of manage real world tasks is great. And if they can build that into OpenAI then it becomes a lot more of a personal assistant than just a general tool that you're using for what we all use, you know, Claude and you know, OpenAI for. So I get where they're going with it. They're trying to kind of build out that different product segments like Anthropic is going. Curious to see if this is a good way to go on this. [00:04:45] Speaker A: It's. I find it strange because OpenAI could never released a tool like that. They could not have released such an immature, unsecure product out to the market. But yet here they are hiring the guy who did this. I mean it's great press for them. Although it was named after Claude in the first place, I think he was shunned a little bit by the fact that Anthropic told him that he had to rename it. Yeah, but, but they could be building [00:05:12] Speaker B: out like what I like that Anthropic is doing is having that true R D wing where they're playing with like the Chrome plugin and the cowork and all, all these other things, the we'll talk about later, the security, npr, which, you know, if I was pre planned this, I would nicely segue into those. But we'll talk about those in 30 seconds. You know, so they're, they're kind of working on the different. How do you leverage the generalness of this into different product segments? And that's not something really that OpenAI I feel is doing yet. [00:05:47] Speaker A: I mean everybody wants Jarvis at home. Yeah, everybody, everybody wants, you know, Demolition man where the guy walked in, was it early 90s? Walked into the room is like illuminate and the lights come on the house. Everyone wants that kind of home automation, but everyone also wants the privacy that goes with it. So I don't think we're going to see that at least with these hosted AI tools. I think they need to be local things, but at least for me to be comfortable with that kind of thing. But I don't know, coworks, it's not the same kind of interface. I mean, I like the flexibility. I like the idea that I can chat with it in Slack or Discord or. I think there's a lot of, there's a lot of fake news, a lot of stuff that was made up around openclaw. People wanted to sort of. [00:06:30] Speaker B: It was a hot thing. [00:06:32] Speaker A: Yeah, yeah, it was. I guess it happens on the Internet, but it's definitely pushing people's perspectives on what AI could, could actually do for us as a, as a civilization or on a personal basis rather than just being. You can organize my Excel files, you know. [00:06:51] Speaker B: But like I was talking with a friend and he was saying, yeah, they have open cloud at work. And they, he was like, somebody literally pinged it to add somebody to a channel. And it's like you could have just so at the same time to tell the AI assistant to do that, to add them to a Slack channel to just do it yourself. Like that's not a hard task to do. [00:07:17] Speaker A: So yeah, I think it's, it's gonna make people lazy. You know, I see people typing, edit this, you know, comment this out in this firewall. You know, that's just a mouse click and a key press and it's done yourself. But now you have this. I think people like the idea that they can delegate work to somebody else, even if it's a machine. [00:07:36] Speaker B: I oversee all the machines, do what I say. Adult think. No, I, I use AI a lot for thinking. For me, I'm like, go do an analysis between these six things and tell me it, build me charts and give me a cross comparison of it or, you know, how do I handle migrations from, you know, A to B inside of Azure when it's supposed to be supported but it's not, you know, because it's Azure. [00:08:03] Speaker A: Yeah, I think they still lack, they lack sort of the human context. I know they're trained on, you know, almost the entirety of human knowledge that's documented and scanned, but they kind of lack the, they lack some really common sense. Sometimes you can give it a goal and it will work to achieve the goal, but it doesn't the way it may achieve the goal. It's like a genie, you know, you wish for something and you get it, but not quite what you intended. So I think, I think I want a little more alignment to the way people do things in these tools rather than just achieve this goal for me [00:08:43] Speaker B: anyway, so in ways that Anthropic is playing with new features. Anthropic has released CLAUDE Code Security and Preview for enterprises and teams. The tool is a multi stage verification process where CLAUDE reexamines its own findings, filters out false positive, assigned severity ratings and confidence scores. Additionally, they've also released the CLAUDE code on desktop which is a fully automated loop for live PRs, inline code reviews and previews of changes. So in this case you can really see Anthropic is trying to attack kind of that developer life cycle, you know, here, go write me something. Go do a PR before I go do a pr, have a security senior dev. This is kind of like my workflow that I have a lot too, which is, hey, go do a thing and then pass it through to two or three agents, you know, depending on the severity of what I'm doing. But like, you know, pass it through to a senior developer, then to a security than to, you know, whatever else and, and kind of review it and they're kind of taking that whole loop and building it out for me, which I'm. I think this is a great tool because I can, you know, shorten my agents MD or cloud MD files down or whatnot, which will be nice. [00:10:04] Speaker A: Yeah, I watch. I don't know how many people watch CLAUDE Code do its work, but I tend to, I tend to sit and watch its thought process as much as I can. I just find it interesting. There's definitely been a shift in the way that it works or maybe it's just a shift in what's being presented on the screen, I'm not entirely sure, but there's definitely been Times working on complex projects where it's gone down a path and it's sort of had this realization that actually no, it made a mistake and it goes back and it redoes it or it considers something or it must bring context in from somewhere else and it reconsiders itself or it double checks itself. It's kind of strange to say a machine saying hang on, let me check that. Then it's like oh no, it's okay, I'll move on now. It's a very human way of going back and self reflecting on the work you just done. But the security tool has been quite devastating for the security industry. Stock price in general, I think it's just going to keep happening. Software is down in general, security is down now. I think every, every corner of the market now where they build some tooling is going to, is going to have more and more of an impact on the economy. [00:11:19] Speaker B: What was it? IBM? I don't know if this is before or after because we're recording it day late this week but there was, there was an article saying that IBM stock went down after. Was it Claude or OpenAI announced like massiveness in Cobalt which then I'm confused why IBM is spending so much, has so much market share associated with Cobalt but maybe there's a piece on that like mainframes, I'm not really sure. [00:11:46] Speaker A: Yeah, a lot of, lot of legacy software so running on Cobalt on mainframes. IBM is really the only supplier of those of that hardware anymore and they hold the expertise in that and it's a cool, it's not a complex language, it's just not well known anymore. So now you can, now you can hire somebody to write COBOL or fix COBOL or come up with a migration plan. I think there's a lot of businesses, even companies I've worked at in the past 15 years who still have COBOL running on these huge monoliths. And the migration path away from that is just horrendous. And so using AI now to figure out migration paths or re implement exactly the same functionality, which I think when it comes to financial services and banking is the big concern concern. How can you be sure that this new thing you write will do exactly the same thing as cobol including you know, all the bugs that turned into features and things like that. So yeah, it's good. I mean who wants to use COBOL anymore? For a start it's not the best. And who wants to spend money on mainframes when there's far better compute now it's the people who are trapped in those situations with impossible migration plans. I wouldn't be surprised if the government still has, you know, the US government still has stuff running in cobol, huge batch jobs that do things, taxes. [00:13:14] Speaker B: What was it during when Covid first started? I think it was like, was it like Utah or Nevada? Like they had a massive outage in the root cause of their unemployment system. It was because their Cobalt based system, backend system wasn't able to handle the scale of it. You know, like so, you know, this stuff's running everywhere. It's just a matter of can you migrate? You know, talking with people about upgrading Python versions becomes a big deal. But like now you're switching languages and the level of effort and the testing and the regression testing and the validation, you know, especially in the financial services, government or anything else, that it's just such a high level of barrier, level of, what's the phrase? Barrier to entry, and a level of effort that people aren't willing to take it on. But if we can leverage a system like this to do it, and then all you have to do is the testing afterwards and maybe you automate your testing too. With it before you move, go write a plugin for Cobalt that checks and watches all API calls or whatnot between the different parts of the system and then can monitor on the other side here and make sure it works the same. You could be able to build something that works effectively and have a high level of confidence it's going to work the same way. [00:14:34] Speaker A: Yeah, yeah. I think there's a lot of tech debt, which isn't difficult tech debt. It's just hard to prioritize because there are features, features, drive sales, other things, drives, performance, drive sales. And so I think a lot of tech debt just gets left to rot even longer than it probably should. And I think maybe this is the best use for AI, is that it can come along and it can build something, it could build something offline, it can build the test harness, it can build ways of running software alongside a running system to model what goes in and out and sort of keep refining itself over a period of weeks or months until you're confident that it can take over. And then you flip a switch and all of a sudden you've done the migration and it didn't, didn't require an outage, it didn't require downtime and sort of proven itself over a length of time. I think it's something that people do themselves manually, but I think having an AI that could just orchestrate the entire Thing would be great. I haven't used have I used clock code on desktop. I have used clock head on desktop. They added clock codings there. I prefer the web interface for it personally or just. Or just the cli. I didn't find the desktop to be great because it doesn't seem to persist the sessions. So you can, you can have it work on stuff but then it doesn't seem to have like a history of. At least on Windows anyway. A history of what you did. [00:15:55] Speaker B: I use Claude Code desktop for just general things, you know, like just talking with the editing documents, things along those lines. I. Sorry, just straight Claude. It does have a cloud code tab on a Mac but I've never used it because at that point if I'm going to go code, I'm in VS code. I haven't used the web. I did see something which we'll probably talk about next week where there's like they have the mobile app that can now link to Claude code which also just sounds like a security nightmare. But we'll talk about that next week. [00:16:27] Speaker A: Yeah, I'm excited to test that because it's like it's a work life balance nightmare. But it's such a great feature. [00:16:35] Speaker B: Yeah. [00:16:36] Speaker A: Because I want to start something going. You do a plan. You install the plugin so that it presses yes, allow this action. But you walk away and you're never quite sure if it's going to stop and ask you for something. Something new, kind of pointless. Yeah, like a shadow stopper. Like hey, are you sure I can CD into this random directory? I'm like, yeah, go ahead and do it. I think they're that agent permissions thing needs a bit of love. But having, you know, having a remote control over your active Claude code session from your phone, wherever you are, it's great. I mean that could be out with a family, get distracted by the phone, answer some questions. You know, not. Not good work life balance but. But a great feature for sure. [00:17:16] Speaker B: All right, on to databricks. Databricks announced the general ability of zero bus ingest part of the lakeflow Connect, a serverless streaming service that pushes data directly into delta tables without immediate message buses like Kafka supports thousands of concurrent connections achieving over 10 gigabytes per second aggregate throughputs with data landing under 5 seconds. The core architectural differences is a single sync design versus Kafka multi sync approach reducing traditional five system streaming stacks down to two components, eliminating the dedicated compute and storage meets. Hmm. [00:18:03] Speaker A: I. I'm kind of curious. I should have read this and Figured out how it worked. It sounds, it sounds amazing, but. Okay, so what kind of. I guess what, what kind of, what kind of use cases? [00:18:14] Speaker B: I guess I think it's just supposed to take. I mean, I think it's Kafka, but without the overhead and everything. They've built their own internal system that can handle the same data flow without having to you to set up. Because what you used to traditionally have to do, at least the way I've done it, is you would dump your, your data into Kafka, onto the stream and then it would process into databricks at that point and into your data lake. Here is just. But you had to manage Kafka at that point. So it's just taking a piece of toil out of the middle of it. [00:18:46] Speaker A: Okay, so it, so it kind of eliminated the, the array of brokers that you have to have in the middle is what I guess is what they're referring to by the multi sync design. You have to send it to a specific broker after you find out which one's responsible for the stream. Okay. [00:19:01] Speaker B: So they just, they got rid of a good chunk of toil in the process. [00:19:07] Speaker A: Not a fan of Kafka in general, but I am a fan of doing things at massive scale. So this is kind of cool. [00:19:14] Speaker B: I feel like all of us are not a fan of overly complex tools like simplicity, but like to be able to scale stuff a lot because we all kind of have a love hate relationship with Kafka Kubernetes, you know, all these really complicated tools because we just don't think most people need them. Or I'll speak for all of us right now, but like we all still know that we need them when we use them when we have to. We might begrudge using them, but we're still using them. [00:19:45] Speaker A: Yeah, that's a very fair assessment. I think the problem is the tools grow features to support many, many customers use cases. And perhaps they should be three different tools or three different flavors of the same tool, which is a simple tool rather than one massive complex thing which is very hard to get a grasp on. All right, well, OpenAI appears to be preparing ChatGPT Pro Lite tier at only $100 a month, which sits between the existing plus plan which is 20, and the full pro plan which is 200. Anthropic have had their max. Max 10. [00:20:23] Speaker B: Yeah, I'm on the max. I'm on the $100 plan. [00:20:25] Speaker A: Yeah, so Anthrop had that for a while. I think there was a call for chat GPT or OpenAI at least to do the same thing because there's a huge gap between $20 and $200 and managing multiple accounts is a bit of a pain. So that's, that's kind of nice. [00:20:44] Speaker B: I just think they needed a different naming convention. They had plus they have Pro. Why is it Pro Light? I understand this isn't the point of the podcast, but like, you could have named it something different. I don't know what to name. I'm not a naming person. I name things X or Y. But like you got plus you got Pro. Why Pro Light? Is it like, to make you feel like you're a professional, but you're not all the way there yet? [00:21:14] Speaker A: I mean they, they would. The other option was Pro cheapskate, but they just went with life instead? I don't know. [00:21:21] Speaker B: I, I, I look at Take Pro Exec. [00:21:23] Speaker A: Yeah. I look at my, my, the usage I get out of my, my max plan with Anthropic and it's just phenomenal. I, I just constantly churning through coding tasks and various other things and I think even, even $200 and you know, slightly privileged to be able to afford something like that. But the value I get out of it is enormous. Probably, probably in the thousands of dollars worth of labor savings, if you like, if I was to delegate the workout or spend time on it myself. [00:21:53] Speaker B: Onto cloud tools, Packer adds SBOM vulnerability scanning. And as somebody that manages an application, really, SBOMs are really big right now and very important in life. So if you don't know what an SBOM is, it's the software bill of materials. So Packer now supports SBOM vulnerability scanning in public beta, allowing platform teams to scan and get their SBOM against MITRE CVE databases, classifying findings and severity directly with the artifact registry. So basically, as you build your artifact, it will then scan it and immediately put in the registry that you have a critical. And you really should probably fix that before you sell it or deploy it out or nicely ignore it. The reason why this is so important is it addresses the supply chain security gap, servicing vulnerability data at the image level, covering AMI's Docker containers and virtual machines before they reach production environments. [00:22:56] Speaker A: I guess it's, it's only as current as the time you built it though. [00:23:00] Speaker B: That's always the problem. [00:23:02] Speaker A: Yeah, it is. Yeah. Makes me think actually a little agent that runs inside. Oh, I mean, there are tools out there. Yeah. You say there's tons of tools out there that, yeah. [00:23:16] Speaker B: Palo Alto Carbon CrowdStrike will scan your images. Carbon Apollo Also has multiple tools for per container. You know, with either your sidecar or load the container. In theory, you would then do the rescan because even like aws, ECR has this where it scans your containers every day and tells you if there's vulnerabilities in it. So while you are correct, it's only. It's. This is only as good as you build it because Packer is a building tool. But then if you put this into the next system, then that can give you more real time analysis also. But it's better to get it before it gets into that system, which I think is the point of this. [00:23:59] Speaker A: Yeah, I mean it's shift left, I guess even better if it. If it didn't have to build it and it just told you ahead of time. Although I suppose if you don't specify it by versions of packages and things, it pulls the latest or it pulls whatever's available in the. In artifactory or wherever you're staging copies of packages. Yeah. All right. Well, Kubernetes 1.35 brings two new auto scaling milestones. We have in place pud resize, which is graduated to generally available and vertical pod autoscalers in place or Recreate Update mode, which allows vertical pod auto scaling to adjust CPU memory on running pods without affecting them. That sounds kind of nice. The practical benefit for stateful workloads is fairly substantial if you're one of those crazy people who like to run databases or SQL Server on Kubernetes because previously those pods will be evicted and new resources requested, which would obviously cause disruption, stale caches and other issues. [00:25:05] Speaker B: I both really like this tool and really hate this tool. I'm not gonna lie. Like, I like the idea. I like the flexibility, you know, hey, versus going redeploying all my nginx containers, just add 200 megabytes of memory, add another CPU so I don't have to redeploy it all and you know, have to watch if I have websockets or something like that, watch the connections kill off and restart. But at the same point it's giving you the ability to not use containers properly and making a container be a vm. And I just dislike that ability. [00:25:43] Speaker A: It's a very blurry line. I appreciate that. No, I don't know. I think I'm surprised this feature didn't exist before because on a host where there's free resources, where there's free CPU memory, do you really want to put that limit on? Is it really important that you requested, you know, two whole cores and 128 meg of memory or however much you're assigning. Is it really important to enforce that? If the host has got spare capacity, [00:26:12] Speaker B: I guess I would say CPU less than memory because memory you, if it can over, you know, memory, you can crash the host a little bit more than CPU just makes it slow down. [00:26:24] Speaker A: Yeah, I mean memory, memory is. You can't share memory with somebody. If you're using the memory, you can't share it. At least CPU you can time slice and everything just gets a little slower. But at the same time, I mean the Linux hosts that run Kubernetes, they have a huge amount of disk cache. [00:26:43] Speaker B: And so that's if you're leveraging like a swap or something like that or are you thinking like physical hard drive caches? [00:26:50] Speaker A: Yeah, just, just the, just the file system has, will use as much memory as it can in Linux it doesn't just sit there empty, it uses, uses it while it can, then it makes it later. So I don't know it was always. I'm just surprised the feature didn't exist. It seems like an obvious one. So I assume there was some good reason why it didn't exist. I guess if you've got the spec for a pod and the pod says this is what it should be, all of a sudden you've now got to say well I've scaled this one so this one's okay at the bigger size but still bring up the new ones at the new size. I don't know, it's all a little strange. [00:27:27] Speaker B: I'm now trying to picture like my Argo CD deployment where I've like told to scale stuff up but one pod it didn't scale up so it's going to stay there like who's watching? But like I just the thought process rather this like always goes in and changes it and ARGOC then goes deploys it and I'm like but what's it deploying it on is you gotta scale it back down. Like it's scale up features but like so then it can't scale down problems. Are you gonna end up with like orphaned containers? Like I get why it took a while for them to build this feature. [00:28:03] Speaker A: Yeah, I mean you could always scale CPU down, that's easy to change but you can't scale memory down once it's [00:28:08] Speaker B: been assigned memory and a hard drive space. Two things you can't scale down easily. [00:28:13] Speaker A: Yeah, I, you know, I guess I do agree with you though that it does kind of start blurring the lines between Maybe this should be a VM with some memory contention or CPU contention on a large hypervisor at this point, rather than a part in Kubernetes. But it's like Kubernetes came along and solved the orchestration for containers problem and now he's sort of forgotten why it was created in the first place and now it's kind of become the de facto standard. This is how you deploy things. And then we realized that actually, no, that model doesn't work for a lot of workloads. And so now we're sort of retrofitting these things to make it much more VM like than it was ever really made for. I don't know. I guess when I see a pod being live migrated to another host, I'll agree with you a little more there. [00:29:00] Speaker B: Please don't. But were we on. We're on 35 now, so probably about. Maybe that'll be like a 2.0 feature. Like that'll be like a require a massive architectural change. [00:29:12] Speaker A: That'd be kind of a fun project to try to try and figure out, actually. Yeah, if you can do live kernel patching, you can live migrate a part to another host. Okay, well, Amazon onto aws. Amazon. An Amazon service was taken down by AI coding bot, apparently. Although Amazon are saying it wasn't the bot itself, it was, it was a person. Just like guns don't kill people, people kill people. Amazon's Kiro AI coding bot caused a 13 hour outage in the Cost Explanation service in December after engineers granted it two broad permissions and it autonomously decided to delete and recreate the environment rather than patch it. Who knows, maybe, maybe, maybe we don't know what it was doing. Maybe you thought that patching it was a higher risk. There may be some logic behind that. It's hard to tell from the outside. A second outage involves Amazon Q developer, though Amazon says neither event impacted core customer facing AWS services. Amazon's position is that both incidents were user error stemming from improper access controls, not failures of the tools themselves. I kind of question that. What's the point in having these tools if you can't trust them to do stuff? [00:30:25] Speaker B: I mean, you could say that about most things though. Why have a car if you don't trust, you know, the car? You trust the driver of the car? [00:30:35] Speaker A: Maybe not entirely convinced. [00:30:38] Speaker B: I mean, I think a lot of it comes down to, you know, the permissions and the users watching it. You know, a lot of the developers that I know just we were talking about it before. I Do it on more of my personal projects. Yeah, yeah, just do whatever you want. I don't really care. You know, I had to build me a PowerPoint and there was playing with the Cloud, Cloud PowerPoint feature and there's one that's just like do what you want. It's essentially the equivalent of what is it like the dangerously you launch cloud with like dash, dash allow dangerous. And essentially it was the version of that. But like if you're letting the AI tool start to do things inside of production environments, that's where you need to watch it and you need to probably have it be a little bit more specific. So the human needs to kind of be watching what's going on and peer reviewing it, not just saying, yeah, yeah, it's not like Matt building a PowerPoint where I'm like, here's my template, go do what you want, here's the content I want, go actually go do things with it. [00:31:39] Speaker A: Maybe I'm drawing a distinction though between the model itself and the agent. The model. The model is. The model is just the model. The model is just the train set of weights. The agent is an agent that they've created to perform a specific task. And if they didn't give it the right information, if they didn't give it the situational awareness so that it understood where it was working or what the consequences would be for performing certain actions, then sure, it's the person who, who wrote the agent prompts. But I was kind of lumped that into. It was the problem with the agent. The agent should have been more aware about the situation in the environment and what was permitted and what was not permitted, whether or not it had the actual permissions to go and do something. [00:32:24] Speaker B: I mean, the piece of this puzzle that I also think is interesting is that in response to the incident, AWS added safeguards including mandatory peer review for production access, which meant there wasn't peer review. You could just go into production and then I have questions of SOC and ISO and all the other standards. Like I guess certain people just had access, which is fine, not really, but we'll go with that. Or anyone could just allow anyone access to production. So you know, Jonathan would just put a request in and I can self approve it. So I guess that's where I'm like trying to understand like they're compliance controls. But that's my compliance and security hat I'm wearing a little bit there. [00:33:10] Speaker A: Yeah, maybe I'd like to think that they have different types of production environment, you know, environments where there's customer data and environments which are just internal services that process like like this service was cost information for example. But yeah, very very very poor show. So the Financial Times report blamed the AI coding tool for the for the outage and Amazon issued a rebuttal the Financial Times they confirmed that the disruption only affect cost Explorer in a single region in China for roughly 13 hours and that the issue is user error. So I don't know. Poor show. [00:33:49] Speaker B: So onto ways to actually limit your permissions in the ability to do things AWS IAM policy Autopilot is now available as a KIRO power and all I can think of is our power. Oh that's what we should have done. We should have done a Captain Planet Power Hero Power Thief. Okay, it could have been. [00:34:11] Speaker A: It could have been he man. I guess it could have been by the power of Kiro. [00:34:15] Speaker B: Yeah. [00:34:16] Speaker A: Anyway, we always think of the best titles halfway through. [00:34:20] Speaker B: Anyway, I feel like we need to have the show title at the beginning and at the end revote on the show title and see how often it changes. Anyway, as AWS IAM Policy Autopilot, an open source static code analysis tool launches at Reima 2025 is a now a CURA power allowing developers to generate baseline IAM policies within their CURA ID without manually writing policies. CURA use includes rapid prototyping for AWS applications, baseline policy creation for new projects, and keeping developers encoding environments rather than switching to IAM console documentation. So the base thing that was announced that reinvent the AWS IAM policy autopilot is something that I've seen 50 different ways to do over the years. I played with this one at one point. It's actually pretty decent. Adding as a cure of power where as you're developing it automatically generates your you know, your IAM policy is actually pretty cool too and it's kind of that real time loop generate the policy. You know, if you put this into your workflow where it automatically generates the policy on every commit, you can then really start to see hey, this feature I did requires this ability which I thought is actually a pretty cool way to handle stuff. [00:35:44] Speaker A: I'm really on the fence about this because on one hand I know the pain about especially with things like deployment policies if you want to have a specific policy for a deployment worker in Terraform, let's say and just trying to figure out every permission that needs to be added so that you can so the terraform which is very non transparent in itself can just do deployment. It becomes very complicated. At the same time if you have a machine that looks at your code and says, this is the policy you need for it, then I don't feel like that's any security at all. Unless there's another check at the end, there should be a sanity check, you know, so you have the spec for the tool you're writing or the app you're writing, and it should be intelligent about the types of permissions that should be required to run that or to deploy that, rather than just blindly going ahead and saying, oh, I see you're calling this endpoint. Let me give you permission to do that, because that, that is no security at all. [00:36:47] Speaker B: Well, in theory, one, you should be reviewing what it's producing. Two, you should have SCPs or anything else along those lines associated, you know, depending on how you want to manage your permissions, deny policies, whatever associated with these things. So maybe you don't want Route 53 delete, you know, register. Like you could put that at a higher level. And even if you do try to delete it, it won't work. So there are many checks and balances you can do. You just have to build that into the human process, not necessarily the coding, development, AI process. [00:37:26] Speaker A: Yeah, I mean, I get it. I do see the value in it. SCP is a real pain though, because all of a sudden you get these very untransparent messages about, oh, this isn't working, but it's in my policy, which has been blocked by something else or an ORG policy or something else. There's no easy solution to security and there shouldn't be an easy solution to security. I think perhaps the best next iteration of it will be, you know, we already have user behavioral analysis, watching people typing on their laptops to make sure they're not doing their laundry or, you know, having fun during the day when they should be working at home. I think we absolutely need machine behavioral analysis. And actually sort of, yes, we have to still provide some kind of policies, but maybe we can be a little looser with the policies. If we have tools that are constantly watching to see what's going on, to see, to see what unexpected actions are being taken. It's going to be a combination of things. I just, I just, I know people will use this and not check the policies that it generates. And thinking about things like supply chain attacks of NPM modules or Python modules, it would be super easy to slip some code in and then have this generated policy which gives an attacker privileges in your cloud environment without realizing it. [00:38:40] Speaker B: Yep. So Jonathan and I decided we were going to do something a Little bit different today. There's all these other stories that we want to talk about, but they're more just things that are happening than conversation pieces. So we made an honorable mention section. Do you want to kick us off, Jonathan, with the honorable mention number one? [00:38:57] Speaker A: Kick us off. So Amazon Redshift Serverless has introduced three serverless reservations, which doesn't sound very serverless, but there you go. Amazon also says it will spend 12 billion on a Louisiana data center. It's the only place they can get enough electricity, I believe at this point. [00:39:14] Speaker B: Consuming power everywhere and announcing AWS Elemental interface, an interface for fully managed AI servers that generates vertically cropped images and videos for things like TikTok, Instagram Reels, YouTube short and other similar platforms without dedicated production stuff. [00:39:34] Speaker A: What could be worse than TikTok videos than AI generated TikTok in their mind, they already are mostly. [00:39:40] Speaker B: I was gonna say. [00:39:42] Speaker A: Yeah, I'll just leave that one. Okay, let's. Let's move on to gcp. Okay. Google Cloud expanded its managed MCP service support to cover Alloy, db, Spanner, Cloud, SQL, bigtable and Firestore, allowing AI agents to interact with these databases through natural language without requiring infrastructure deployment or complex configuration. And I think this is absolutely the way to go for everything. I think, you know, MCP Server will feature as a, as an API to every service that every cloud provider has within the next 12 months. If they don't, they're behind the times. So the security model relies entirely on IAM for authentication rather than shared keys and all agent actions logged in the cloud audit logs. You would hope so. A new Developer Knowledge McP server connects IDEs directly to Google's official documentation. Okay, good luck with that. Let the agents reference best practices in real time during tasks like database migrations. Can you just imagine you're halfway through a database migration? There's like, hey, it's like Clippy pops up. Hey, you should have done something different anyway, because the servers now follow open MCP standards. They work with third party clients like Anthropic's Claude in addition to Gemini, which broadens the practical appeal beyond teams already committed to Google's AI tooling. [00:41:01] Speaker B: Anything that makes databases easier, I am all for. I just dislike databases. They're always the problem. There's always a problem in them. They always make my life a living hell. But I understand they're a necessary evil because we have to store data in a relational, relational or non relational database scheme. And this is what you end up with. 17 tools that help you understand your database. [00:41:31] Speaker A: Yeah, it's it's got me interesting. Imagine testing a service like this that uses natural language to talk to a database and trying to debug why it didn't give you the answer you expected. It's like, oh, you didn't say please. Yeah, you got three queries and if you don't say please, you don't get a fourth one anyway. Gemini 3.1 Pro is now available in preview for developers via Google AI Studio, Gemini CLI, Vertex AI and Android Studio with enterprise access through Vertex AI and Gemini Enterprise. Pricing details aren't publicly publicly announced yet. The model scores very highly, 77.1% on the ARC AGI2 benchmark, which tests reasoning on novel logic patterns, representing more than double the score of the previous Gemini 3 Pro model. That sounds like a huge architectural change they've made. And the fact that it's a point release is crazy. Probably indicating that they already built this into Gemini 3, but it hadn't enabled the feature. Practical use cases highlighted, including sorry, include generating animated SVGs from text prompts, building live data dashboards by connecting the public APIs, and prototyping interactive 3D interfaces with hand tracking and generative audio. The example suggests the model is particularly suited for developers working on data visualization and creative coding projects. Sell your Unity stock now. [00:42:57] Speaker B: I mean it's just amazing to me how fast these models are improving. This one's saying it's a 77 score of 77% where I mean models a year ago were like 40s and 50%. So like just seeing how fast everything is moving is still insane. I mean, I love that they are rolling out to everywhere. You know, I play with Notebook LLM for various little projects I have, so it's nice they're rolling it out not just to, hey, you know, Android and all these other places, but they're rolling out across the suite, which is beneficial too. [00:43:32] Speaker A: Yeah, I'm wondering when we're going to have this sort of price adjustment of AI though. It's very cheap, very, very cheap right now. Everyone complains about the price and yes, yes it's expensive, but also for what you get, it's incredibly cheap. And so I, I think there will come a point where the price will adjust up significantly and people will actually have to pay for the value they're getting. Instead of sort of funding, the building of data centers is pretty much what you're paying for right now. Pay somebody's electric bill. [00:44:01] Speaker B: Google's Firefly is a clock synchronization protocol, is a software based clock synchronization protocol that achieves sub 10 nanosecond for nick to nic synchronization across data center hardware without requiring specialized or expensive dedicated timing equipment. The protocol uses a distributed consensus algorithm built on random graphs rather than a traditional hierarchical time server model, which improves convergence speed, stability, scalability and reliance on network path asymmetries. Firefly decouples internal synchronization from UTC time synchronization, meaning external time server jitter does not degrade the precision of the clock alignment within the data center fabric itself. This is highly important for finance ML workloads and other systems that require that level of detail. Yeah, I think it's pretty cool. [00:45:01] Speaker A: It is cool. And thinking about what they did with the global replication of Spanner that was very much based around a very accurate time synchronization. But the fact that you need to guarantee sub 100 microsecond synchronization for financial systems is crazy. [00:45:20] Speaker B: And they're at 10 nanoseconds. They're so far beyond that, my brain can't even compute realistically what a nanosecond is. But doing it from server to server, I mean, it's a complete paradigm shift of like, you know, your strata zeros, strata ones, you know, and knowing that as you get down further, you get further and further away from the accurate type to having kind of a mesh network manage itself and the complexity of that and deciding what's accurate. But I guess they built the algorithm with enough intelligence that it's able to kind of keep that in line. I just wonder that if it ever like strays too far, is that we're going to be able to kind of come back. So if it's always expecting, all these are going to be within let's say a hundred nanoseconds and all of a sudden it's at, you know, four seconds off reality. How do you kind of pull that back in? Like, do you slowly have to do it? So when we have leap seconds, is that can be a problem for these systems? Not sure. [00:46:26] Speaker A: Yeah, that's interesting. I kind of wonder if they have their own timer which is counting up, on which they map external time would seem to make sense. [00:46:37] Speaker B: Like the US Epoch. [00:46:38] Speaker A: Yeah. Okay, so we have a few honorable mentions for Google as well. Google's investing 15 billion in AI infrastructure. Surprise, surprise in India and launching America India Connect, a multi continent subsea cable in initiative that establishes new fibrotic routes connected the United States, India, Singapore, South Africa and Australia. Now Justin loves the subsea cable stories. It's gonna be very Disappointing to miss this one and we kept it in just for you. Justin. [00:47:06] Speaker B: I like these stories. I always. I don't know why I find it so fascinating, but I do. [00:47:10] Speaker A: I have another story which we're not gonna cover in this. It was unrelated random thing that popped up on my Google newsfeed because it knows I read nerd stories. It was the. The very first transatlantic fiber optic cable and is being pulled up from the sea surface. And I couldn't believe how recently, relatively speaking, it was placed. It was only in the 80s when that thing was unrolled and sort of laid on the seabed. I guess they're pulling it up to clear routes for newer, bigger, faster cables. Also, it contains an enormous amount of copper, which is probably the main reason they're going to reclaim it. [00:47:49] Speaker B: The space doesn't feel like in the vastness of the ocean. That's a good reason. The copper on the other hand, feels like a really good reason because it's not going to be cheap to lift that thing up. Because God knows if it's in the 80s, what's laying on top of the cable as they're trying to pull it up at this point? What, rocks and sediments and everything else? [00:48:08] Speaker A: Yeah, who knows? That's a lot of copper. 2,000 miles of shielded fiber optic cable. [00:48:16] Speaker B: Somebody had to do the ROI on that to determine that. Makes sense too. [00:48:21] Speaker A: Google's also building a new data center in Willwage County, Texas, expanding its existing infrastructure footprint in the state. This is primarily an infrastructure capacity announcement rather than a new GCP service or feature. I wonder what they're going to put in that data center. [00:48:37] Speaker B: Well, what's interesting here is they're using air cooling though, versus water cooling. So it'll be interesting to figure out how they cool things in Texas with the air. Just saying. [00:48:51] Speaker A: Yeah. [00:48:52] Speaker B: And our final honorable mention is the Lyra 3 to create music tracks in the Gemini app, you can now generate 30 second music tracks with lyrics, custom cover art and style controls from prompt text, uploaded pictures and videos. More things that can be used in TikTok, Facebook, Instagram and other social medias. Hence a 30 second limit on it. [00:49:18] Speaker A: All right, let's move on to Azure, a milestone achievement in our journey to carbon negative. From the official Microsoft blog. Microsoft has achieved its 2025 goal. It's 2026 now, guys. Sorry. Of matching 100% of global electricity consumption with renewable energy. Contracting 40 gigawatts of new renewable capacity around across 26 countries since 2020. It represents enough energy to power approximately 10 million US homes with 19 gigawatts currently online and the remainder coming online over the next five years. [00:49:53] Speaker B: I always kind of try to understand carbon neutral and carbon negative because you're buying renewable energy and you're talking about the level twos but you're still using the energy. So you're really negative because somebody still had to build that power play to build these things that made it. Which I understand is also why the level 3 or the strata 3, whatever the third level layer is that there is where it talks about where you get your power, where you get your materials and everything feels like there's just different stories that you spin based on what you want to tell people at the time with it. But I mean it's, it's a great thing that they are doing that they're trying to buy renewable energy and we're trying to make the world die from all these, you know, data center usages. [00:50:45] Speaker A: Yeah, it's a fine line, I think, I think really it's fossil fuels is, is the issue. If you, if you to go plant a forest full of trees, let them grow for 10 years, harvest them, dry them, burn them for energy and then plant some more trees that's still carbon neutral because you are not releasing any more CO2 into the atmosphere than the, than the trees already absorbed in their growth cycle. It doesn't sound very environmentally friendly burning a bunch of trees. But it's still carbon neutral. It's only not carbon neutral if you're burning energy faster than the earth is sequestering it. Which is a bit shocking for some people. You know, burning, you know, burning trash, burning paper, that's all carbon neutral. Because we planted those trees. [00:51:31] Speaker B: Well, yeah, because you're essentially saying I plant trees and therefore I can do this. [00:51:35] Speaker A: Exactly. [00:51:36] Speaker B: So now generally available quota and deployment tool troubleshooting tools for Azure flexible consumption plans. You now have the ability directly in the platform to see that you're hitting your quota limits, constraints without digging through documentation or opening support tickets and praying that somebody on the other end of the support ticket knows what you're saying when you point them exactly at the issue. The quota troubleshooting experiences surfaces, flexible consumption, specific limits in context, which is useful when you hit a limit and you don't know why your application all of a sudden started breaking. This is a great quality of life improvement because you can see why things are breaking when you're using flexible consumption as far as per execution billing model and fast scaling abilities reducing potential failures. [00:52:28] Speaker A: Yeah, I kind of wonder if it really makes sense again. It's like it's this thing we build, a serverless thing. And yes, you can use it and it's per second or per gigabyte second or however they bill it in Azure. But if you're operating at the kind of scale where this becomes a serious concern, either because, you know, you put limits in place because of cost concerns, then maybe you shouldn't be using this in the first place. [00:52:51] Speaker B: Well, yes and no. I've seen it where you've hit limits on lambda when you do nightly processing or things like that, and you scale out a lot more horizontally than you normally do. So you kind of hit this limit, but so random things fail. So I've kind of seen that where you in theory could launch something, but you want that horizontally scalable because you're running it all day at lower scales. But potentially you need to run it at a much more horizontally scalable as like certain times or if you're a web page and you all of a sudden become a fad of the day and you know, your website essentially gets DDOs from real people, so you kind of need that. Seeing those things I think is important and seeing. And otherwise as a developer, a DevOps, an SRE, whatever your role is, when you're the one debugging this, you can now actually see that you're being throttled and you can see the failures occurring where before you would be yelling at Microsoft support. [00:53:52] Speaker A: Yeah, I guess, I guess the observability piece into what's causing it. I assume it's not just telling you that you've hit a limit, it's tracing back the call through the stack to tell you exactly why you've reached these limits or you've exceeded the limits. You were. [00:54:09] Speaker B: I think it would tell you you exceed the limits, you know, or you don't have enough quota, you don't have enough ability to launch this in the tenant subscription. Sorry. [00:54:19] Speaker A: Yeah, I guess the other thing is if it's built into the platform, then it's less. It's less tool than you have to buy. You know, you don't need the new relics and the datadogs to monitor this stuff for you. [00:54:29] Speaker B: Yep. [00:54:30] Speaker A: If it's built into the platform, so it's kind of cool. And the next thing will be auto, auto adjusting quotas based on, based on [00:54:37] Speaker B: the new metrics, based on reality of the metrics. And that's kind of where I would love to see quotas be a lot more dynamic, where it's like you've launched two servers and by default you have a four server limit. Okay, maybe at four you really should get an eight quota limit. Therefore if you launch two more because you're launching a second scale set or auto scaling group, you can get two more servers and not be at your limit without knowing it it. So if there was more dynamic in these limits it would be nice. But I understand they're there for multiple [00:55:10] Speaker A: reasons that that makes a lot of sense. Actually you reminded me of the burstable compute on Amazon. You know they're t. Was it the TT instances? [00:55:20] Speaker B: Tc? [00:55:21] Speaker A: Yeah. [00:55:21] Speaker B: Come on Jonathan. It hasn't been that long since you've been on aws. [00:55:24] Speaker A: It has. It's been fuzzy. It's been far too long. It kind of reminds me of that because you know the way you set your limits on that, you still build up some spare capacity while you're not using full capacity and then you can burn it in bursts when you have the traffic but then you go back down to baseline. I'd like to see more quotas be like that really and have some kind of time awareness. And so instead of saying I want this Fixed limit of 2000 invocations per hour or something for my serverless function, you should be like my target is around 2000 per hour. But if you've been at 1000 per hour for four or five hours and then you have a peak up to 4000 that should be okay. Does that on average over the time period you're still below your average limit. It's not costing you more than you thought if that usage had been higher earlier. So I think a little more flexibility. I think you're right around the way quotas are implemented. A bit more time awareness would be kind of cool. All right, some honorable mentions. I think this is the last couple of stories. Microsoft Sentinel's CCF push feature is now in public preview which allows security data providers to send logs directly to Sentinel workspace without the traditional setup overhead of manually configuring data collection endpoints and Microsoft [00:56:40] Speaker B: Sovereignty Cloud adds governance, productivity and large language models securely when running when even completely disconnected. Azure Local, which is their local zones, which is their local outpost and things like that. Disconnected operations are now generally available allowing organizations to run mission critical infrastructure with full Azure governance policy enforcement even when they're completely isolated from the cloud, AKA they don't have Internet. This targets governments, defense and other regulatory industries that require that independence. [00:57:14] Speaker A: This is actually a bigger story than I realize I think. So the Foundry local stuff is on prem hardware so they're actually supporting Nvidia GPUs in the on prem hardware so you can perform isolated inference at that's fairly large scale. That's kind of cool. [00:57:30] Speaker B: I think they've had that for a little bit, but this is really about it being disconnected. So if the outpost, if the Azure local what's Google? I don't remember what Google's is called, but if they are not connected to the cloud they're still able to continue to operate. [00:57:46] Speaker A: Yeah, yeah, I know that was a big problem with outposts and I, you know I'm sure there's some large organizations and military who want to put these things in containers and drop them in various places around the world. Makes sense that you want local AI in that case to run whatever untoward things or toward things as the case may be. [00:58:08] Speaker B: And finally in our emerging cloud section, introducing Command center, the unified operations platform for AI. Kouser Command and Control. Unified Platform consolidates GPU clusters, monitoring, orchestration and support into a single interface. It supports managed Kubernetes and managed Slurm, allowing for long running multi week training jobs to operate continuously across large GPU clusters. Autoclusters is a key component that automatically detects GPU degradation, evicts compromised nodes and replaces them with healthy ones from a reserve pool. On the observability side, it supports multiple access methods including via UI, including Grafana, Prometheus endpoints and APIs. The watch agent paired with the telemetry relay sends visibility to custom application metrics allowing teams to correlate workloads performance with underlying GPU health for more precise troubleshooting. The whole stack here is what I kind of find Nice. You know, the smaller clouds are really trying to, at least from what I can see, attack kind of that whole vertical a lot more where they're giving you that depth all the way down to if you are training your own model with them. You get the cpu, you get the gpu, you can see that whole stack of what's going on and really start to fine tune and say oh, we slowed down our training because we lost two GPUs, you know. And seeing that is not something you would see on the larger hardware scales. [00:59:47] Speaker A: Yeah, and Unlike most hardware, GPUs don't last very long in a production data center, especially if they're heavily utilized. A couple of years at most I think. And the hardware is pretty much scrap. Neat. All right. DigitalOcean is also making some changes. They're adding AMD Instinct Mi 350X GPUs to its droplets lineup are built on the CDNA 4 architecture and optimized for inference workloads including a pre fill phase compute, low latency token generation and larger context windows. The platform demonstrated measurable results with existing customers, including a two times increase in production request throughput and a 50% reduction in inference costs for carrier AI. Giving potential adopters concrete performance benchmarks to [01:00:31] Speaker B: evaluate new GPUs is always good. So if you're on DigitalOcean, here's your new GPU. Go burn some money. [01:00:42] Speaker A: Cool. And that is it. [01:00:44] Speaker B: Well Jonathan, it's always great just chatting with just us, but hopefully next week we'll have a few more of us here. [01:00:50] Speaker A: Yep, I don't think we quite reached our goal of 30 minute show today, but it's been good fun. [01:00:56] Speaker B: Always fun to chat. See ya. [01:00:59] Speaker A: See you later, Matt. And that's all for this week in Cloud. Head over to our [email protected] where you can subscribe to our newsletter, join our Slack community, send us your feedback, and ask any questions you might have. Thanks for listening and we'll catch you on the next episode.

Show Notes

Titles we almost went with this week

General News

AI Is Going Great – Or How ML Makes Money

Cloud Tools

AWS

GCP

Azure

Emerging Clouds

Closing

Chapters

Episode Transcript

Other Episodes

Episode

Episode 23: Unlock the podcast with your Android device

Episode 326

326: Oracle Discovers the Dark Side (And Finally Has Cookies)

Episode 321

321: The Cloud Pod is in Tears Trying to Understand Azure Tiers