343: AWS CloudWatch Finally Hits Snooze

Welcome to episode 343 of The Cloud Pod, where the forecast is always cloudy! Justin, Ryan, and Matt are in the studio this week bringing you all the latest in Cloud and AI news, including some of the smaller clouds like Cloudflare and Crusoe Cloud, as well as announcements from the big guys like Google’s Gemini DeepThink, Anthropic’s big pay day, and Microsoft’s Notepad problem. We’ve got all this plus Matt screwing up his outro AGAIN, so let’s get started!

Titles we almost went with this week

Chrome’s WebMCP Protocol: Teaching AI Agents to Stop Doom-Scrolling the DOM and Actually Get Work Done
Claude Enterprise Self-Service: Because Sometimes You Just Want to Buy AI Without Small Talk
AWS EC2 Goes Inception Mode: Now You Can Virtualize Your Virtualization Without Going Broke
Amazon EC2 Nested Virtualization: Because Your Virtual Machine Was Lonely and Needed Its Own Virtual Machine
CloudWatch Alarm Mute Rules: Because Your Deployment Doesn’t Need a Standing Ovation at 3 AM
Anthropic’s $380 Billion Valuation Proves AI Funding Has Gone Claude Nine
AWS EC2 Nested Virtualization Finally Escapes the Expensive Hardware Jail
Cloudflare Teaches AI Agents the Magic Words: Accept text/markdown and Save 13,000 Tokens
Crusoe Cloud’s MCP Server: Teaching AI Assistants to Stop Asking for the Manager and Just Fix Your Infrastructure
Azure’s New Agentic Copilot: Because Manually Clicking Through Dashboards Was So 2023
Chrome’s WebMCP Gives AI Agents a GPS for Websites Because Apparently They’ve Been Lost in the HTML This Whole Time
Anthropic Cuts Out the Middleman: Claude Enterprise Now Available Without the Enterprise Sales Dance
AWS Gives CloudWatch the Silent Treatment: New Mute Rules Let Alarms Sleep Through Maintenance Windows
AWS CloudWatch Hits Snooze: Mute Rules End On-Call Nightmares
AWS Gives CloudWatch the Silent Treatment

General News

00:45 Bloat Risk? Microsoft’s Notepad Upgrade Also Introduced a Vulnerability | PCMag

Microsoft’s recent Notepad modernization introduced CVE-2026-20841, a vulnerability in the new Markdown support feature that allows malicious links in files to execute remote code.
The flaw has been patched in the February 2026 security updates, but it highlights the security trade-offs when adding features to historically simple applications.
The vulnerability exploits Notepad’s Markdown rendering capability, which Microsoft added in May to support lightweight markup language formatting. When Notepad opens a specially crafted Markdown file, embedded malicious links can trigger unverified protocols that load and execute remote files on the system.
This incident raises questions about feature bloat in core Windows utilities, particularly as Microsoft continues adding network-dependent capabilities like AI-powered text writing to Notepad. Security researchers are debating whether basic text editors should have network functionality at all, given the expanded attack surface.
The vulnerability demonstrates how modernization efforts can introduce security risks in previously low-risk applications.
Organizations using Windows need to ensure their systems receive the February 2026 security updates to address this specific flaw in Notepad’s Markdown implementation.

02:04 Matt – “I’m just confused why they didn’t use Copilot on their pull request in order to identify this as a potential bug. I feel like it should have found it. Just sayin’…”

03:13 WebMCP is available for early preview

Chrome is introducing WebMCP, a standardized protocol that lets websites expose structured tools and actions directly to AI agents, eliminating the need for agents to parse raw HTML and DOM elements.
This addresses a key reliability problem in agentic workflows where AI agents currently struggle with inconsistent web interactions.
The protocol offers two interaction modes: a declarative API for simple HTML form-based actions and an imperative API for complex JavaScript-driven workflows. This dual approach lets websites define exactly how agents should interact with features like booking systems, support ticket forms, and checkout processes.
Early use cases focus on high-value transactional workflows, including e-commerce product configuration, travel booking with complex filtering requirements, and automated customer support ticket creation with technical details. These scenarios benefit most from structured interactions versus unreliable DOM manipulation.
The early preview program requires sign-up for access to documentation and demos, indicating this is still in experimental stages.
Developers interested in making their sites agent-ready will need to implement these new APIs to participate in the agentic web ecosystem Chrome is building.
This represents Chrome’s attempt to standardize how AI agents interact with websites before the market fragments with competing approaches. Sites that adopt WebMCP early may gain advantages as browser-based AI agents become more prevalent.
Interested in signing up for the preview? You can do that here.

04:41 Ryan – “It makes a lot of sense why they want to standardize on a specific protocol, but I can’t help but feel like this is the beginning of the end of human interaction; where you’re going to have an AI agent-to-agent protocol.”

AI Is Going Great – Or How ML Makes Money

07:27 Anthropic raises $30 billion in Series G funding at $380 billion post-money valuation \ Anthropic

Anthropic closed a $30 billion Series G at a $380 billion post-money valuation, reaching $14 billion in run-rate revenue with 10x annual growth for three consecutive years.
The company now serves eight of the Fortune 10, with over 500 customers spending more than $1 million annually.
Claude Code, made generally available in May 2025, has grown to $2.5 billion in run-rate revenue and now accounts for 4% of all public GitHub commits worldwide. Business subscriptions quadrupled since early 2026, with enterprise customers representing over half of Claude Code’s revenue.
Opus 4.6 launched last week as the latest model release, leading the GDPval-AA benchmark for economically valuable knowledge work in finance and legal domains. The model powers agents capable of generating professional documents, spreadsheets, and presentations autonomously.
Anthropic expanded its product portfolio in January with over thirty launches, including Cowork, which extends Claude Code capabilities to broader knowledge work with eleven open-source plugins for specialized roles.
Claude for Enterprise is now HIPAA-compliant and available for healthcare and life sciences organizations.
Claude remains the only frontier AI model available across all three major cloud platforms through AWS Bedrock, Google Cloud Vertex AI, and Microsoft Azure Foundry.
The company trains on diversified hardware, including AWS Trainium, Google TPUs, and NVIDIA GPUs, to optimize workload performance and resilience.

08:10 Matt – “Those numbers are insane. I just want to make sure we’re all clear about that.”

15:16 Introducing Sonnet 4.6 \ Anthropic

Claude Sonnet 4.6 is now generally available across all Claude plans, API, and major cloud platforms at the same pricing as Sonnet 4.5 ($3/$15 per million tokens), with a 1M token context window in beta.
The model now serves as the default for Free and Pro plan users, bringing Opus-class performance to a mid-tier price point.
Computer use capabilities have improved substantially, with Sonnet 4.6 scoring 94% on insurance benchmarks and showing human-level performance on tasks like navigating complex spreadsheets and multi-step web forms.
The model demonstrates better resistance to prompt injection attacks compared to Sonnet 4.5 and performs similarly to Opus 4.6 on safety evaluations.
Coding performance has advanced significantly, with early users preferring Sonnet 4.6 over Sonnet 4.5 roughly 70% of the time and even choosing it over Opus 4.5 59% of the time.
Users report better instruction following, less overengineering, fewer hallucinations, and more consistent follow-through on multi-step tasks, with one customer reporting an 80.2% score on SWE-bench Verified.
Several features have reached general availability on the API, including code execution, memory, programmatic tool calling, tool search, and tool use examples.
Web search and fetch tools now automatically write and execute code to filter search results, improving response quality and token efficiency.
The model supports both adaptive thinking and extended thinking modes, with context compaction in beta that automatically summarizes older context as conversations approach limits.
Claude in Excel now supports MCP connectors, allowing users to pull data from external sources like S&P Global, LSEG, and PitchBook directly within spreadsheets.

17:42 Ryan – “I haven’t played with Sonnet because it’s just released, but playing around with Opus, you can see that it’s another major improvement in these steps, and it is pretty fantastic to use.”

19:44 Token Anxiety – by Nikunj Kothari – Balancing Act

This article describes a cultural shift in San Francisco’s tech scene where developers are prioritizing AI agent management over social activities, with people leaving parties early to check on overnight code generation and spending weekends running 12-hour build sessions with AI assistants like Claude and Codex.
The piece highlights how AI coding tools have created a new productivity anxiety where developers feel compelled to keep agents running continuously, even during sleep, to maximize output and stay competitive as new model capabilities and context windows are released weekly.
Developers are adopting new vocabulary around AI models, discussing them like sommeliers evaluate wine and using animal training metaphors like keeping Claude on a tight leash for code review while giving it more slack for creative work.
The constant stream of benchmark improvements and new AI capabilities is creating pressure to continuously optimize workflows, as each advancement makes previous methods feel outdated and multiplies the sense that competitors are already leveraging these improvements.
This represents a broader shift in developer culture where traditional leisure activities are being replaced by AI-assisted building, with the primary social metric changing from what you accomplished to how many agents you have running in parallel.

24:25 Ryan – “I still don’t know how everyone has these overnight workloads; I guess I don’t trust AI at all; I’m not going to let it run unsupervised.”

31:48 Alibaba Launches New LLM as China’s AI Battle Heats Up

Qwen 3.5 is out. No industry freakouts (like with DeepSeek) so far

33:06 Seed News – ByteDance Seed Team

ByteDance officially launched Seedance 2.0, a next-generation video creation model with a unified multimodal audio-video architecture supporting text, image, audio, and video inputs.
The model can process up to 9 images, 3 video clips, 3 audio clips, and natural language instructions simultaneously for comprehensive content referencing and editing.
The model delivers substantial improvements in complex motion rendering and physical accuracy, particularly excelling at multi-subject interactions like competitive figure skating with synchronized movements, mid-air spins, and precise landings that follow real-world physics.
Industry evaluations show Seedance 2.0 achieves leading performance in motion stability, instruction following, and visual aesthetics compared to competing models.
Seedance 2.0 introduces dual-channel stereo audio generation with multi-track parallel output for background music, ambient effects, and voiceovers synchronized to visual rhythm.
The model supports 15-second high-quality multi-shot audio-video output suitable for commercial advertising, film VFX, game animations, and explainer videos.
New video editing capabilities allow targeted modifications to specific clips, characters, actions, and storylines, plus video extension functionality for generating continuous shots based on user prompts.
The model demonstrates improved instruction-following for complex scripts and open-ended prompts while maintaining subject consistency across extended sequences.
The unified multimodal architecture enables professional-grade content creation workflows where users can reference composition, motion, camera movement, visual effects, and audio elements from input assets, significantly lowering barriers to industrial-level video production without requiring specialized technical expertise.
https://www.instagram.com/reel/DUm4zSvEn76/ – John Wick cat video as mentioned.

34:53 Justin – “I’m surprised Hollywood stock didn’t crash today over this; very very impressive. Crazily so.”

AWS

36:47 Announcing new Amazon EC2 general purpose M8azn instances

AWS launches M8azn instances powered by fifth-generation AMD EPYC Turin processors running at 5GHz, the highest CPU frequency available in the cloud. These general-purpose instances deliver 2x compute performance over M5zn and 24% better performance than M8a instances, with 4.3x higher memory bandwidth and 10x larger L3 cache.
The instances target latency-sensitive workloads like high-frequency trading, real-time financial analytics, and simulation modeling for automotive and aerospace industries.
Built on sixth-generation Nitro Cards, they provide 2x networking throughput and 3x EBS throughput compared to M5zn instances.
M8azn instances come in nine sizes from 2 to 96 vCPUs with up to 384 GiB memory at a 4:1 memory-to-vCPU ratio, including two bare metal variants. Available in US East Virginia, US West Oregon, Tokyo, and Frankfurt regions through On-Demand, Spot, and Savings Plans pricing models.
The high-frequency positioning fills a specific niche for workloads requiring maximum single-threaded performance rather than just core count.
This complements AWS’s broader M8a lineup by offering customers a choice between standard frequency instances and these premium high-frequency variants for specialized use cases.

37:03 Announcing Amazon EC2 C8i, M8i, and R8i instances on second-generation AWS Outposts racks

AWS is bringing C8i, M8i, and R8i instances to second-generation Outposts racks, delivering 20% better performance and 2.5x more memory bandwidth compared to the previous C7i, M7i, and R7i generation. These instances also provide 20% more compute capacity within the same physical rack space and power consumption, improving density for on-premises deployments.
The new instances run on custom Intel Xeon 6 processors exclusive to AWS and target workloads that need enhanced on-premises performance, including large databases, memory-intensive applications, real-time analytics, high-performance video encoding, and CPU-based ML inference.
This addresses the gap for customers who need cloud-class compute but must keep workloads on-premises due to latency, data residency, or regulatory requirements.
Second-generation Outposts racks continue AWS’s hybrid cloud strategy by extending the latest EC2 instance types to customer data centers with the same APIs and tooling as the public cloud.
The availability varies by region, so customers should check the Outposts rack FAQs page for current country and territory support before planning deployments.
The performance improvements come primarily from the memory bandwidth increase and processor generation upgrade, which should benefit database operations, in-memory caching, and data-intensive applications that previously hit memory bottlenecks on Outposts.
The power and space efficiency gains matter for customers with constrained data center capacity or energy budgets.

37:08 Amazon EC2 Hpc8a Instances powered by 5th Gen AMD EPYC processors are now available

AWS launches Hpc8a instances powered by 5th Gen AMD EPYC processors, delivering 40% higher performance and 42% greater memory bandwidth than the previous Hpc7a generation, while offering up to 25% better price-performance for tightly coupled HPC workloads like computational fluid dynamics and weather modeling.
The instances come in a single 96xlarge size with 192 cores, 768 GiB memory, and 300 Gbps Elastic Fabric Adapter networking, featuring customizable core counts at launch and sixth-generation AWS Nitro cards for offloaded virtualization functions. Simultaneous Multithreading is disabled by default to optimize HPC performance.
Available now in US East Ohio and Europe Stockholm regions, with support for AWS ParallelCluster, AWS Parallel Computing Service, and Amazon FSx for Lustre integration to simplify cluster management and provide sub-millisecond storage latencies. Customers can purchase as On-Demand Instances or through Savings Plans, with specific pricing available on the EC2 pricing page.
The 1:4 core-to-memory ratio and high core density target compute-intensive simulation workloads requiring rapid time-to-results, including crash simulations and high-resolution weather modeling within tight operational windows. The customizable core count feature allows right-sizing based on specific HPC workload requirements without paying for unused capacity.

39:20 Ryan – “I’m sure they use a subcontractor for actual maintenance, things. But I’m sure that you have to give them access and manage them just like you would any other remote hands for your data center.”

39:37 MSK simplifies Kafka topic management with new APIs and console integration

Amazon MSK now provides native AWS APIs for Kafka topic management, eliminating the need to set up and maintain separate Kafka admin clients. The three new APIs (CreateTopic, UpdateTopic, and DeleteTopic) work alongside existing ListTopics and DescribeTopic APIs through AWS CLI, SDKs, and CloudFormation, letting teams manage topics using standard AWS tooling and IAM permissions.
The MSK console now consolidates all topic operations in one interface with guided defaults for creating and updating topics. Users can configure properties like replication factor, partition count, retention policies, and cleanup settings while viewing comprehensive partition-level metrics and configuration details directly in the console.
These capabilities are available at no additional cost for MSK provisioned clusters running Kafka version 3.6 and above across all regions where MSK is offered. Organizations need to configure appropriate IAM permissions to use the new APIs, with setup instructions available in the MSK Developer Guide.
The update addresses a common operational pain point where teams previously had to maintain separate Kafka admin tooling outside the AWS ecosystem. This integration brings Kafka topic management into standard AWS workflows, improving consistency with existing infrastructure-as-code practices and centralized access control through IAM.

40:47 Ryan – “I suspect this has more to do with Kafka than AWS because Kafka is notoriously hard to administer, so in a lot of cases there’s just not the ability…so I’m really happy to see this.”

42:40 Amazon Bedrock adds support for six fully-managed open weights models

Amazon Bedrock now supports six new open weights models, including DeepSeek V3.2, MiniMax M2.1, GLM 4.7, GLM 4.7 Flash, Kimi K2.5, and Qwen3 Coder Next, providing frontier-class performance at lower inference costs than proprietary alternatives.
These models cover different enterprise needs from advanced reasoning and agentic tasks to autonomous coding with large output windows and lightweight production deployments.
The models run on Project Mantle, a new distributed inference engine that accelerates model onboarding to Bedrock while providing serverless inference with quality of service controls and automated capacity management. Project Mantle includes native OpenAI API compatibility, allowing customers to switch from OpenAI endpoints without code changes.
The addition of these open weights models gives AWS customers more flexibility in model selection based on specific workload requirements and cost constraints.
DeepSeek V3.2 and Kimi K2.5 handle complex reasoning tasks, while GLM 4.7 and MiniMax 2.1 support coding workflows with extended context windows, and Qwen3 Coder Next and GLM 4.7 Flash offer cost-efficient options for high-volume production use.
Project Mantle’s unified capacity pools and higher default quotas address common scaling challenges customers face when deploying large language models.
The serverless architecture eliminates infrastructure management overhead, while the automated capacity management helps prevent quota limitations during peak usage periods.

44:05 Matt – “I like how they made it all compatible with OpenAI. It’s kind of like S3 compatibility; I feel like we’re slowly kind of coming to a standard, which means you can go play with it and see which model makes sense.”

46:02 Amazon EKS Auto Mode Announces Enhanced Logging for its Managed Kubernetes Capabilities

EKS Auto Mode now integrates with CloudWatch Vended Logs to automatically collect logs from its managed Kubernetes capabilities, including compute autoscaling, block storage, load balancing, and pod networking.
This gives customers centralized visibility into Auto Mode’s infrastructure management operations without manual configuration.
The integration uses CloudWatch Vended Logs, which provides lower pricing than standard CloudWatch Logs while maintaining built-in AWS authentication and authorization.
Customers can route logs to CloudWatch Logs, S3, or Kinesis Data Firehose, depending on their retention and analysis requirements, with standard destination charges applying.
Each Auto Mode capability can be configured independently as a log delivery source through CloudWatch APIs or the AWS Console.
This granular control allows teams to monitor specific components like the Karpenter-based autoscaler or VPC CNI networking without collecting unnecessary log data.
The feature addresses a common operational challenge where Auto Mode’s automated infrastructure management previously operated as a black box. DevOps teams can now troubleshoot issues like pod scheduling failures, storage provisioning problems, or load balancer configuration errors by examining the actual logs from Auto Mode’s control plane operations.
Available immediately in all regions where EKS Auto Mode operates, this logging capability helps bridge the observability gap between customer workloads and AWS-managed Kubernetes infrastructure components.

47:05 Justin – “All I have to say is, some lovely CloudWatch PM just made their bonus this year by turning this one, as this is a lot of logging context that you now need to parse and pay for.”

49:26 AWS CloudWatch Alarm Mute Rules eliminate alert fatigue

CloudWatch Alarm Mute Rules let you temporarily silence alarm notifications during planned maintenance windows, deployments, or off-hours without disabling the underlying monitoring.
The feature supports up to 100 alarms per rule with one-time or recurring schedules, and automatically triggers any suppressed actions once the mute period ends if the alarm state persists.
This addresses a common operational pain point where teams either ignore alerts during maintenance windows or use risky script-based workarounds that can be forgotten and leave monitoring disabled.
The native integration eliminates the need for custom automation to manage notification states during planned activities.
The feature is available today across all AWS regions that support CloudWatch alarms at no additional cost beyond standard CloudWatch pricing.
Configuration is done through the CloudWatch console or API, with support for all alarm states, including OK, ALARM, and INSUFFICIENT_DATA.
Primary use cases include silencing non-critical alerts during scheduled deployments, muting development environment alarms outside business hours, and suppressing known issues during maintenance windows.
This helps reduce alert fatigue while maintaining full visibility into system state and metrics collection.
The automatic re-triggering of muted actions ensures teams don’t miss persistent issues that started during a mute window, providing a safety mechanism that manual notification management typically lacks.

50:49 Ryan – “This is much nicer. Basically, set it for ignore for an hour and then have it kick back in. Glad to see this, but strange that it took this long.”

52:48 Amazon EC2 supports nested virtualization on virtual Amazon EC2 instances

AWS now supports nested virtualization on standard EC2 instances, not just bare metal, allowing customers to run KVM or Hyper-V hypervisors inside virtual machines. This expands flexibility for development and testing scenarios that previously required more expensive bare metal instances.
The feature launches on the latest generation C8i, M8i, and R8i instance families across all commercial AWS regions.
Customers can now run mobile app emulators, automotive hardware simulators, and Windows Subsystem for Linux on Windows workstations directly on virtual instances.
This capability addresses a long-standing limitation where nested virtualization required bare metal instances, which carry higher costs and longer provisioning times compared to standard virtual instances.
The change makes nested environments more accessible for development teams and testing workflows.
Common use cases include software vendors who need to test their products across multiple operating systems, automotive companies simulating vehicle hardware environments, and mobile developers running Android or iOS emulators at scale.
These workloads can now run on more cost-effective instance types with faster deployment.
The feature requires enabling hardware virtualization extensions in the instance configuration, with full documentation available in the EC2 user guide. Pricing follows standard EC2 rates for the C8i, M8i, and R8i instance families without additional charges for the nested virtualization capability itself.

54:13 Ryan – “These kinds of announcements are usually preceded or quickly followed with Nitro…and it’s neat. It’s neat how they isolate the hardware layer to match these workloads.”

54:50 Announcing Amazon SageMaker Inference for custom Amazon Nova models

AWS now lets customers deploy custom-trained Amazon Nova models on SageMaker Inference with production-grade controls over instance types, auto-scaling, context length, and concurrency settings.
This addresses customer requests for the same deployment flexibility they get with open-weight models, enabling full-rank customized Nova Micro, Nova Lite, and Nova 2 Lite models trained via SageMaker Training Jobs or HyperPod.
The service reduces inference costs by supporting more cost-effective EC2 G5 and G6 instances instead of requiring P5 instances, with auto-scaling based on 5-minute usage patterns and configurable inference parameters.
Customers pay only for compute instances used with per-hour billing and no minimum commitments, following standard SageMaker pricing.
Deployment works through SageMaker Studio UI or SDK, supporting both real-time streaming and asynchronous batch inference modes. The service includes advanced configuration options for context length up to 8000 tokens, max concurrency settings, and inference parameters like temperature and top-p for optimizing latency-cost-accuracy tradeoffs.
Currently available in US East N. Virginia and the US West Oregon regions, with support for Nova models with reasoning capabilities.
Instance type requirements vary by model size, with Nova Micro supporting g5.12xlarge and up, Nova Lite requiring g5.48xlarge minimum, and Nova 2 Lite needing p5.48xlarge instances.

56:47 Ryan – “It’s not an open-source model, and so it is kind of crazy that Nova offers that customization.”

GCP

57:25 Gemini 3 Deep Think: AI model update designed for science

Google has released a major update to Gemini 3 Deep Think, a specialized reasoning mode designed for complex scientific and engineering problems where data is messy or incomplete, and solutions aren’t straightforward.
The model achieved notable benchmark results, including 48.4% on Humanity’s Last Exam, 84.6% on ARC-AGI-2, and gold medal performance on the 2025 International Math, Physics, and Chemistry Olympiads.
Early adopters are using Deep Think for practical applications like identifying logical flaws in peer-reviewed mathematics papers, optimizing semiconductor crystal growth fabrication methods, and converting sketches into 3D-printable files with generated code.
The model combines deep scientific knowledge with engineering utility to move beyond theoretical work into applied research.
The updated Deep Think is available now to Google AI Ultra subscribers through the Gemini app, with pricing following the existing Ultra subscription model.
For the first time, Google is offering API access through an early access program for select researchers, engineers, and enterprises who can apply through a Google form.
The release targets scientific research institutions and engineering teams working on complex problems in physics, chemistry, materials science, and advanced mathematics, where traditional AI models struggle with ambiguous requirements.
Deep Think’s ability to work with incomplete data and generate executable code for physical modeling makes it particularly relevant for R&D workflows.

1:00:19 New global queries in BigQuery span data from multiple regions

BigQuery global queries now allow users to run a single SQL statement across datasets stored in multiple geographic regions without requiring ETL pipelines or data replication.
The feature automatically handles cross-region data movement in the background while respecting existing security controls like VPC Service Controls and requiring explicit opt-in at both the project and user level.
The primary use case targets multinational organizations that need to analyze distributed data for compliance or performance reasons, such as joining US customer data with European transaction logs and Asian operational data in one query.
EssilorLuxottica is using this to perform cross-region aggregated analysis while maintaining data residency requirements for security and compliance. (DOES IT THOUGH?)
Users maintain control over where queries execute and can specify the processing location to meet data residency requirements, though cross-region data transfers will incur additional egress costs that organizations need to factor into their analytics budgets.
The feature is currently in preview with documentation available here.
This addresses a longstanding limitation in cloud data warehousing, where geographic data distribution required complex engineering solutions, now replaced by standard SQL queries that any authorized analyst can run directly from the BigQuery console. The feature respects governance controls by default and prevents accidental data movement through required permissions and explicit enablement.

1:01:36 Matt – “I feel l ike it is compliant… if you’re running local and you’re not collecting anything that could be confidential. So it depends on how your lawyer at your company interprets it.”

Azure

1:03:47 Agentic cloud operations and Azure Copilot for AI‑driven workloads

Microsoft introduces agentic cloud operations through Azure Copilot, which uses AI agents to automate and coordinate cloud management tasks across the full infrastructure lifecycle. Instead of adding another dashboard, Azure Copilot provides a unified interface accessible through natural language, chat, console, or CLI that connects directly to a customer’s actual Azure environment, including subscriptions, resources, and policies.
Azure Copilot includes six specialized agents that handle migration discovery and dependency mapping, deployment with infrastructure-as-code generation, continuous observability across the full stack, cost and performance optimization with carbon impact analysis, resiliency management including ransomware protection, and troubleshooting with root cause diagnosis.
These agents work as a connected system rather than isolated tools, correlating signals and taking action within existing RBAC and policy controls.
The service maintains governance through built-in oversight features, including Bring Your Own Storage for conversation history, which keeps operational data within the customer’s Azure environment for compliance and sovereignty requirements.
All agent-initiated actions are reviewable, traceable, and auditable while respecting existing security policies and role-based access controls.
Target customers are organizations running modern applications and AI workloads at scale, where traditional manual operations cannot keep pace with rapid deployment cycles and infrastructure changes.
The approach addresses environments where workloads move from experimentation to production in weeks and where telemetry streams continuously from every layer of the stack.
Pricing details were not disclosed in the announcement, though the service builds on existing Azure Copilot capabilities introduced at Microsoft Ignite. Organizations can access resources and get started at azure.microsoft.com/products/copilot.

1:05:39 Matt – “Also, a developer actually understanding what they want and telling you what they want and actually being useful? I would love to see too, because how many times have we built something, deployed it, day before the release – we actually need these 16 other things that we didn’t tell you about that we manually did in our dev environment, which is why it’s working… and the release is tomorrow. Good luck. Why is it not done yet?”

1:06:18 General Availability: Instant access support for incremental snapshots of Azure Premium SSD v2 and Ultra Disk

Azure now offers instant access to incremental snapshots for Premium SSD v2 and Ultra Disk storage, eliminating the previous wait time when restoring disks from snapshots.
This addresses a significant operational pain point for customers running high-performance workloads that require rapid disaster recovery or quick environment provisioning.
The feature specifically targets enterprise customers using Azure’s highest-tier storage options, Premium SSD v2 and Ultra Disk, which are typically deployed for mission-critical databases, SAP HANA, and other latency-sensitive applications.
Previously, customers had to wait for snapshot data to fully hydrate before using restored disks, creating delays in recovery scenarios.
Incremental snapshots only capture changes since the last snapshot, reducing storage costs and backup windows compared to full snapshots.
With instant access now available, customers can immediately mount and use restored disks while background hydration completes, improving recovery time objectives for business continuity planning.
This capability brings Premium SSD v2 and Ultra Disk snapshot functionality closer to parity with standard Azure managed disk snapshots.
The feature is now generally available across Azure regions where Premium SSD v2 and Ultra Disk are supported, though specific pricing for snapshot storage follows existing Azure snapshot pricing models based on stored data volume.

1:06:25 Justin – “Welcome to what Amazon and Google have been doing for quite a while, so thanks, Azure!

Emerging Clouds

1:08:16 Introducing the Crusoe Cloud MCP server

Crusoe Cloud released an MCP server that connects AI coding assistants like Claude Code and Cursor directly to cloud infrastructure, but unlike typical API wrappers, it returns filtered responses designed specifically for LLM consumption to avoid flooding context windows with unnecessary data.
The server includes composite tools like get_resource_relationships that map entire infrastructure topologies in a single call by fetching 11 resource types in parallel and resolving cross-references, something that doesn’t exist in their CLI or any single API endpoint.
The cluster_health_check tool provides pre-analyzed node-level health metrics organized by InfiniBand pod placement, returning structured summaries with problem nodes flagged rather than raw metric time series that would require additional processing.
This approach addresses a key limitation of AI agents working with cloud infrastructure: most MCP implementations just wrap CLI commands and return the same JSON a human would see, forcing the AI to parse through irrelevant metadata and empty fields.
The implementation reflects a broader trend of cloud providers releasing MCP servers, but Crusoe’s focus on response filtering and burst-heavy access patterns specific to AI agents suggests infrastructure management tools are being redesigned around LLM capabilities rather than human interaction patterns. For developers already using AI coding assistants, this enables natural language infrastructure queries and troubleshooting without manual scripting or console navigation.

1:10:16 Ryan – “This is gonna be chaos.”

1:10:21 Introducing Markdown for Agents

Cloudflare now automatically converts HTML to markdown for AI agents using content negotiation headers, reducing token usage by up to 80 percent.
When agents request pages with Accept: text/markdown, Cloudflare’s network performs real-time conversion at the edge, eliminating the need for downstream processing and reducing costs for AI systems.
The feature addresses a fundamental inefficiency where AI agents waste tokens parsing HTML markup, navigation elements, and styling that have no semantic value.
A simple heading that costs 3 tokens in markdown can consume 12-15 tokens in HTML, and this blog post example shows 16,180 tokens in HTML versus 3,150 in markdown.
Cloudflare includes an x-markdown-tokens header with converted responses to help developers calculate context window sizes and chunking strategies. The service also automatically adds Content-Signal headers indicating the content can be used for AI training, search results, and agentic use, integrating with their Content Signals framework from Birthday Week.
The feature is available in beta at no cost for Pro, Business, and Enterprise plans, with Cloudflare already enabling it on their own blog and developer documentation.
Popular coding agents like Claude Code and OpenCode already send the appropriate accept headers, positioning this as infrastructure for the shift from traditional SEO to AI-driven content discovery.
Cloudflare Radar now tracks content type distribution for AI bot traffic, allowing analysis of how different agents consume web content over time. This data is accessible through public APIs and shows early adoption patterns like OAI-Searchbot requesting markdown content.

Closing

And that is the week in the cloud! Visit our website, the home of the Cloud Pod, where you can join our newsletter, Slack team, send feedback, or ask questions at theCloudPod.net or tweet at us with the hashtag #theCloudPod

[00:00:00] Speaker A: Foreign. Welcome to the Cloud Pod, where the forecast is always cloudy. We talk weekly about all things aws, GCP and Azure. We are your hosts, Justin, Jonathan, Ryan and Matthew. [00:00:18] Speaker B: Episode 343 recorded for February 17, 2026 AWS Cloudwatch finally learns to hit Snooze Good evening Matt and Ryan. How you guys doing? [00:00:29] Speaker C: I want to be snoozing right now. [00:00:32] Speaker B: I would love to be snoozing. I was. I spent the weekend in Vegas and was up way too late gambling multiple days but had a good time. So nice. This morning I was definitely hating myself. I was like why do I do this to myself? I'm over 40 now. I can't. I can't get this kind of. Not in my 20s. What am I thinking? Yeah, well we have a couple cool stories this week, so first up, if you've updated Notepad recently or you probably have a virus so Microsoft, which we would think consider Notepad to be fully mature until AI came along, has decided they need to be modernized and by modernizing they introduced CVE2026 20841, which is a vulnerability in the new markdown support feature that allows malicious links and files to execute remote code. Seems like a problem and the flaw has apparently been patched as of February 2026. So if you have not updated on your Windows update to that asap. So you you can always add features, but you're always adding security trade offs. So keep that in mind. Literally exploits Notepad's markdown and rendering capability, which Microsoft added in May to support lightweight markup language formatting. And when Notepad opens a specifically crafted markdown file, embedded malicious links can trigger the unverified protocol, which it seems like a pretty obvious like duh. Yeah, anytime I'm putting a URL processor in something I'm always wrapping it in a lot of protection. This incident raises questions about feature bloat and core Windows utilities, particularly as Microsoft continues adding network dependent capabilities like AI powered text writing to Notepad, which does not need that Security researchers are debating whether basic tech editors should have network functionality at all given the expanded attack services. And the onlybody demonstrates how modernization efforts can introduce security risk and previously low risk applications. Organizations using Windows need to ensure their Systems receive the February 2026 update. [00:02:06] Speaker C: As I mentioned, I'm just confused why they didn't use Copilot on their pull requests in order to identify this as a potential bug. I feel like it should have found it. Just saying. [00:02:17] Speaker A: Yeah, you know it's one of those things I I kind of agree with the sentiment of like, why does my Notepad really need to make web calls? [00:02:27] Speaker C: The answer is it doesn't and it shouldn't. Yeah, it should be simple. [00:02:32] Speaker A: Yeah, I mean, I understand how, you know, you make a mistake in the rendering and, and I can see how, you know, that's just sort of like it seems obvious in retrospect, but the other bits I'm a little less excited by. But I don't use Notepad a whole lot. I don't use a lot of Windows ecosystem in general. So hopefully there hasn't been a whole lot of markdown usage in Notepad in the last, what, six months longer. [00:02:57] Speaker B: Oh yeah, the vulnerability might have been there. It doesn't mean I was exploited. That's. Yeah. So it was a key difference between found vulnerability versus actually actively out in the wild. So. [00:03:06] Speaker A: Yeah. [00:03:09] Speaker B: Well, in a really interesting article this week, Chrome is apparently introducing Web mcp, which is going to be a standardized protocol that lets websites expose structured tools and actions directly to AI agents, eliminating the need for agents to parse raw HTML and DOM elements. This addresses a key reliability problem in agentic workflows where AI agents currently struggle with inconsistent web interactions. Protocol offers two interaction modes. A declarative API for simple HTML form based actions and an imperative API for complex JavaScript driven workflows. This dual approach lets websites define exactly how agents should interact with features like booking systems, support ticket forms and checkout processes. Early use case focus early use cases focused on high value transaction workflows including E commerce, product configuration, travel booking with complex filtering requirements and automated customer support ticket creation with technical details. These scenarios benefit most from structured interactions versus unreliable DOM manipulations. Early preview program requires signup for access to documentation and demos indicating this is is still in experimental stages. Developers interested in making their sites agent ready will be need will need to implement these new APIs and participate in the GenTech Web Ecosystem Chrome is building out. And this is their attempt to standardize how AI agents interact with websites before the market fragments with competing approaches. Which we'll talk about later. [00:04:20] Speaker A: Yeah, I mean, you know, I've, I look at this as indeed, you know, because it is sort of one of those things. A lot of the AI interactions are go, you know, go do a whole bunch of research or go, you know, configure you know, like travel sites, you know, request tickets, find tickets, that kind of thing. So it makes a lot of sense and why they want to standardize on, on a specific protocol. But I can't help but like feel like this is the beginning of the end of human interaction. Right. Where it's like, you know, they're going to have, it's a, you know, AI agent to agent specific protocol. You know, how that'll have to be developed alongside the, you know, JavaScript sort of CSS HTML stuff that you have today. It's kind of, kind of interesting. [00:05:04] Speaker C: Yeah. Like you said, it's the beginning of the end of humans doing anything. It's soon going to be a prompt to cloud or to copilot or whatever your choice is of go buy me a flight to Florida. And it's going to go through everyone and say, okay, based on your parameters, this is the best flight time and the best prices via Expedia and they will go book it for you. And that will be nice at one point and also slightly terrifying. [00:05:34] Speaker A: Yeah, well, and think of, you know, like you're no longer displaying advertisements, so where's, you know, how are things monetized? You know, it's just going to be a radical change to so many different elements. [00:05:49] Speaker C: I never thought about the advertisement because now I'm like, okay, are they cannibalizing part of their business model because you don't have to see ads by going to the website for Google Ads and whatnot too. Yep, interesting thought. [00:06:02] Speaker B: I mean there, there was an article I was reading about, you know, basically SEO and, and how much organic traffic has dropped off and you know, it ties directly back to when Google added AI summaries to the search results. And so it's, it was interesting, but they were saying it's, it's not actually a bad thing. It is definitely reducing organic traffic, but the traffic does come is typically higher quality because it's actually people buying something or doing, doing something. So there's definitely risk to their business model unless they can figure out how to insert ads in a way that isn't like totally invasive in the process. So it's definitely. The web is changing as we know it. It was designed to sell your eyeballs and get you to buy stuff. And the reality is now is AI doesn't care about that, what your techniques are. They don't care about signing up for a newsletter or giving you a phone number to get 10% off. They're just going to find the best price and they're going to report that back to the user who's going to say, yes, put buy that thing and they're just going to do it through an API. So the web will change, which might be a good thing for us because I do kind of Hate the e commerce driven web that we have today. [00:07:01] Speaker A: Yeah, I, I fear what replaces it but I agree it is sort of terrible. [00:07:07] Speaker B: Well, I mean we all know have jobs by that point so it won't matter. [00:07:11] Speaker C: So I'll be broke so it won't matter. [00:07:12] Speaker B: We won't be commercing anyways. Yeah, that's true. We'll see. All right, well moving on to happier things. AI is how ML makes money. And Anthropic is making a lot of money by raising 30 billion in their Series G funding which puts them at 380 billion post money valuation with a 14 billion in run rate revenue with 10x annual growth for three consecutive years in the row, company now serves 8 of the Fortune 10 with over 500 customers spending more than a million dollars annually. With Anthropic cloud code made generally available on May 25, has grown to $2.5 billion in run rate revenue and now accounts for 4% of all public GitHub commits. Worldwide business subscriptions quadrupled since early 2026, with enterprise customers representing over half of the cloud code revenue. [00:07:56] Speaker C: Hold on, hold on, hold on. Before you keep going, I'm going to interrupt you. Yes, those numbers are insane. I like it. Like, like I just want to make sure we're all clear about that. Like, it's one of those things I've known but like I obviously didn't have time to sew prep because I have a few sick kids upstairs. But I'm like sitting here going, oh my God, one cloud code has only been out for nine months. [00:08:18] Speaker B: Right? [00:08:18] Speaker C: Like and how much it's revolutionized everything I do and 4% of all public get up. Everything we do get up commits. Then my bigger question is Copilot also can do commits and everything else can do commits. How many other tools are actually is there and how many real commits are actually out there nowadays versus AI commits besides mattress swearing at anatomy objects. That's easy debugging stuff, which is probably like 1% of all human commits at this point. [00:08:45] Speaker A: Yeah, I mean you'll be able to tell because the commit messages are actually going to be meaningful instead of. [00:08:49] Speaker B: Yeah, they're more than just like oh my God, it's broken again. I an idiot. I can't write code. But what mine look like? Typically I'm like, oh my God, I can't believe I missed that paragraph. I missed the return, the space is missing the comma. All of that's my commits. Yeah, and then there's typically like 100 of them as I figured out that I'm like, yay, feature's done. [00:09:09] Speaker C: Yeah. [00:09:09] Speaker B: Then I, then I figured to squash. So I just push all that right up to GitHub and people, and my embarrassment lives on forever. [00:09:15] Speaker C: Forever. [00:09:16] Speaker B: So I do squash. That makes no sense. [00:09:18] Speaker A: Yeah. [00:09:19] Speaker B: I'm just glad it's working by now. I'm so frustrated. [00:09:21] Speaker A: Yeah, exactly. [00:09:22] Speaker C: And you close your laptop, you walk out, you don't care what people think about you, Justin. [00:09:27] Speaker B: Exactly. That's why I'm in management. All right. [00:09:31] Speaker C: And now you can go on. Sorry. [00:09:32] Speaker B: Okay. They point out the Opus 4.6 launched last week as the latest model. And we'll talk about the new model that came out today. And Anthropic expanded its product in January with over 30 launches, including cowork with extends cloud code capabilities to broader knowledge work with 11 open source plugins for specialized roles. Cloud for Enterprise is now HIPAA compliant and available for all healthcare and life sciences organizations. And it remains the only frontier AI model available across all three major cloud platforms through Bedrock, Vertex AI and Azure Foundry. And the company trains on a diversified hardware including AWS Trainium, Google TPUs and Nvidia GPUs to optimize workload performance and resilience. So, yeah, it definitely seem like they're capturing a lot of hearts and minds in the enterprise space right now. Yeah. [00:10:16] Speaker A: Which is, you know, interesting because, because it's, you know, I feel like, you know, Microsoft's copilot had a, like a head start and, you know, there's a lot of O365 subscriptions kind of built in. You know, I've seen definitely Google Play catch up to that. And then, you know, like, this seems like Anthropic is going to catch up to both of them and maybe even overtake. So it's kind of nuts. [00:10:39] Speaker B: If you had told me two years ago that this company called Anthropic was going to own OpenAI and Gemini in some markets, I would have been like, yeah, okay, maybe. I mean like you look at the, you look at Cohere and you look at. What's the other French one? Is it Cohere, the French one? I don't remember. You know, these ones just don't have the same level of adoption as Anthropic does comparatively. So it's, you know, definitely of the non Hyperscaler vendors or OpenAI who invented a lot of this. Anthropic is definitely crushing it in a lot of ways. I mean, like, this is the number one request I have at the day job is we want cloud Code. And I'm like, well, we get it through Vertex. No, no, we want the real cloud code because there's plugins and things for like Chrome Enterprise that don't work with the Vertex version. And as they keep rolling up all these new features, you know, they roll out to their service first. And so now everyone wants their service as the first party sooner than later, so. And I get that it makes sense. I, I want it too. So I'm helping to fight that fight at that, at the day job. But are you fighting Ryan? [00:11:39] Speaker A: I mean, it's. [00:11:41] Speaker B: No, no, Ryan is in full support, so. Yeah, yeah. One of the few times that Ryan and I agree that this would be amazing to have. So weird. Yeah. [00:11:51] Speaker C: I think the thing they're doing so well is the labs because every time they go play with a new lab, it's, you know, they're really innovating and they're just testing idea like Cowork, when it first came out, was a lab and it fumbled through stuff and it required six things. But, you know, you let them kind of see what the people wanted to do with it. I think that lab structure, something that none of the other providers have at that scale, so it lets people play and they're keeping that innovation alive in them and not just kind of going, cool, we have everything. They're really pushing the frontier of, okay, we saw that people are using Claude code, but they're doing what probably you and I and Ryan here did, which was go do random tasks that aren't really code related. Just set it up in a fake repo and go find a thing. And they're like, wait, we could build it? And it's not that big of a shift to go build this minor thing. And then they added 14 plugins or whatever the number was, which I didn't even know they added into Cowork. [00:12:49] Speaker A: Yeah, I mean, it's like we talked about last week, you know, giving these, you know, platforms access to the data is where all the real power is. And so, you know, this is, you can see where, you know, their enterprise engagement is going to pay off for them. And it's, you know, it's crazy, the numbers, so. But yeah, I mean, it's, it's, it's my favorite tool that, you know, I've been playing around with for a while now. [00:13:15] Speaker B: Yeah, it's, it's definitely. [00:13:17] Speaker C: Would you say. [00:13:21] Speaker A: No, I wasn't that queer. [00:13:23] Speaker C: I don't know. That was awake in May. [00:13:24] Speaker B: But he was a, he was like many of us was a doubter. At first, I think I was probably the first. Either Jonathan or I were the first ones to try it because then we told you guys how cool it was. But you know, last week we were playing with the PowerPoint. I don't think it was during the show because that would have been pretty boring show. But I, we literally, you know, had it pulled my corporate template and I put it into PowerPoint and then I said, I want you to make a slide deck on like security awareness training. And it built a whole deck while we were talking. And it was, it was impressive. So. And that was, I'd say it's probably pretty slow. I bet it, you know, with Sonnet 4.6 and some of these, you know, newer, faster models coming out, I think they'll. That stuff will get faster and better every day. [00:14:00] Speaker C: I was gonna say I actually used it for a C level presentation this week after our space, after the podcast. I was like, here's what I want. And played with. Had a few bugs along the way, but it got there much better than I would have ever done in a PowerPoint. [00:14:15] Speaker B: It was much better than GitHub, Copilot and Gemini, which I tried to do slides with both times as well and left like, just like this was worse than me just doing it myself. So I mean, yeah, you know, and it's just, it takes a very novel approach to it. It's like, I'm not going to try to be fancy. I'm just going to use XML that is native to the file and I'm just going to modify that. And then the cool thing is like, okay, I want to fix this one thing. You just click on it and you just hit tell it what do you want to fix about that thing? And it knows what you're talking about it compared to Gemini or to Copilot where it's like, I don't understand what you want me to do. I'm not advanced enough to do that. It's like, well, are you not? Or you should know how to do this. Microsoft, you built PowerPoint. Well, speaking of Cloud, Sonnet 4.6, it's naturally available as of this morning, hot off the presses across all cloud plans, APIs and major cloud platforms at the same price as Sonnet 4.5 which was $3 for input token at $15 per million output tokens with a 1 million token context window in beta model now serves as the default for free and pro plan users bring Opus class performance to a mid tier price point. Computer use capabilities have improved substantially with Sonnet 4.6, scoring 94% on insurance benchmarks and showing human level performance on tasks like navigating complex spreadsheets and multiple step web forms. Model demonstrates better resistance to prompt injection attacks compared to Sonnet 4.5 and perform similar to Opus 4.6 on safety evaluations. Coding performance has advanced significantly with early users preferring Sonnet 4.6 over Sonnet 4.5 roughly 70% of the time and even choosing over Opus 4.5 59% of the time. Users report better instruction following less over engineering, fewer hallucinations and more consistent follow through on multiple step tasks, with one customer reporting an 80.2% score on swe bench verified. Several features have reached generic general availability on the API, including code execution memory, Programmatic tool, calling tool search and tool use examples. Web search and Fetch tools now automatically write and execute code to filter Research results are improving response quality and token efficiency, and the model supports both Adaptive thinking and extended thinking modes with context compaction and beta that automatically summarizes older context as your conversation approaches the limit. Cloud and Excel now supports MCP connectors, allowing users to pull data from external sources like SMP global LSEG and PitchBook directly within your spreadsheet itself. So that's also quite cool. The spreadsheet feature. I was going on a whole thing and I asked it to graph a bunch of things and did that as well and that tool's gotten much better since I last played with it too. So definitely PowerPoint one mentioned already and now the spreadsheet one as well and then all was Sonnet 4.6 so I have to go update Bolt to go get off of Sonnet 4.5 so that'd be my project tomorrow. But yeah, definitely glad to see the new Sonnet models. They weren't originally thinking this is going to be Sonnet 5 and then you know, OpenAI dropped their latest model and I think Anthropic realized that they leapfrogged them a little bit and so they didn't want to use the 5.0 moniker right away and so I think that why it ended up being 4.6. I think Opus 4.6 as well was kind of a similar story. I think those are both supposed to be 5.0 releases, so to Anthropic they were bigger deals than they now look like in the market. [00:17:12] Speaker A: Well, I mean there, there's definitely every I haven't played with Sonic because it's just released, but playing around with Opus like you can see that it's another major improvement in these steps and it is pretty fantastic to use and it is something I really like know the, the context compaction and and the tooling that that enables when using, you know, either cloud code or other IDs is is amazing in Opus. I'm glad to see it and Sonnet because it is really sort of the, the ban of my existence is managing that context and you know, it does slow down your operations a little bit, but I'd much rather have that increased time than I would some of the crazy hallucinations that extending over the cont. What does cost? [00:17:56] Speaker B: Yeah, yeah, I find that and I don't know if that's just my experience, but I I'm, you know, doing a lot more searching between Gemini, Claude and OpenAI and I find that most of my hallucination problems typically come in open AI and I don't know if it just has a default to always wanting to answer me, which then forces it to be more of a liar because it doesn't want to disappoint me because that's what its instructions tell it to do or, or what's going on there. But I definitely notice them less and less in Claude and others, but I still notice them quite frequently in OpenAI and I just don't know exactly why that is. [00:18:31] Speaker A: Yeah, I just know that I'm also, I have the same experience, but I'm also giving the least amount of context and instruction to OpenAI just because I don't really have it embedded to any platform where I've customized it at all with any kind of configuration or memory. So like I'm trying to give it a grain of salt, it's having to, you know, extend itself a bit more. But yeah, it just, it really does. You know, that's why I haven't invested in it is because it doesn't give me. I don't trust it. I don't trust the data that it [00:19:00] Speaker B: gets back this is a fun article about token anxiety. Basically this was a article written by Nikund Kothari, who's describing a cultural shift in San Francisco's tech scene where developers are prioritizing AI agent management over social activities, with people leaving parties early to check on their overnight code generation and spending weekends running 12 hour build sessions with AI systems like Claude and Codex. The piece highlights how Aida coding tools have created a new productivity anxiety where developers feel compelled to keep agents running continuously, even during their sleep, to maximize the output and stay competitive. As new model capabilities and context windows are released weekly, developers are adopting new vocabulary around AI models, discussing them like Sommeliers valuing wine using animal training metaphors like keeping Claude on a tight leash for code review while giving it more slack for creative work. The concentration of benchmark improvements and new AI capabilities is creating pressure to continuously optimize workflows as each advancement makes previous methods feel outdated and multiples and multifies the sense that competitors are already leveraging those improvements. This represents a broader shift in developer culture where traditional leisure activities are being replaced by AI assisted building. I mean, I do know that I spend a lot more time playing with color code and coding stuff than I, than I used to, which has replaced my gaming time because it's kind of like playing a game. And so I like, I was talking to my friend today and he was like, oh, you know, you played the latest expansion for a game I've been playing for years. And I'm like, no, I'm pretty behind because I just haven't prioritized that over coding fun home projects. And I like have an idea for creating a new touch enabled, you know, wallboard, you know, versus like buying one from one of the vendors doesn't have that. And so I've been tying and toying with some ideas around that. And so I got like all these ideas just running around in my head now all the time. Stuff I wanted to fix for a long time that I just didn't have time for. And so gaming and stuff has kind of gone out the window for me personally. But how about you guys? Do you have token anxiety? [00:20:46] Speaker A: I wouldn't call it token anxiety, but a similar experience to you. Whereas like I have because I want, I can do more, I am and like ideas that I was limited by because, you know, I didn't know Swift and didn't want to really develop against in, you know, into the App Store and the Apple ecosystem. That kind of thing was limiting before. And now with AI it's like, oh, you don't have to get into the, you don't have to have a developer account in Apple. You can just hook it up and trust it directly onto your phone and load it to be xcode and like really? Oh, I didn't know that. You know, like, and Swift, like it's probably writing terrible code. Like I have no idea. But it's just an app for me that you know. And so like that kind of thing where I'd never be able to do that before and or I would never invest the time anyway. And so this is, you know, like, yeah, I'm doing more cool stuff instead of other stuff but you Know, anxiety where I'm leaving parties early? [00:21:40] Speaker B: No, yeah, I'm not. I definitely not leaving any parties early. But I do like to make sure that I have a really good set of, you know, tasks available to my, you know, to my agent before I leave so that it's busy doing what I need to do. I, you know, But I'm not going to leave a party for that either. There's booze there. Why am I leaving that? [00:21:58] Speaker C: But that's, I think, to me, was kind of the point of the article, which is, you know, you're essentially trying to set it up so your AI developer has a bunch of work to go do while you're not there, and spending that prep time and post time to review it all when you get back, you know, takes time. Now I'm like, you guys. Where I'm like, ooh, I can go do these fun things. Or, hey, there's this really annoying thing that I do at work once a month. Screw it, let me just automate it. Because I can just throw it in a lambda and call it a day now. Versus me having to, like, figure out how to, you know, copy spreadsheets, copy data, do whatever ridiculously complicated thing I'm doing. I'm like, cool, here's what I do. Here's how I get there. Break it down, go do it. And I have this thing that, you know, the old XKCD of how long does it take you versus, you know, how long does the task take out? Is like. I'm like, well, that all just went out the window because I'm like, ooh, I could spend an hour and like, break it down, break down the project, get it to the point where I wanted to go. And then between meetings or, you know, in a meeting I'm not paying attention to, I could just keep saying, yep, that sounds good. Great. Okay, now run your unit tests, you know, and get that framework. So that's. To me, I feel like, less of the token anxiety, but more of like the, oh, cool. If I spend 15 minutes, I can get X percent more out and I can be X more productive for myself or for work for whatever. That, to me is kind of like the, the. The takeaway from the article is that we all are doing it and so are many other people. We're like, how do we keep ourselves being more and more effective? [00:23:38] Speaker A: I still don't know how everyone has these, like, overnight workloads. Like, I. It's partially, I guess, because I don't trust AI at all. Like, I, I'm not going to let it run unsupervised because every time I try and experiment or dabble like it always comes like something terrible happens and I feel like it's more awkward to backtrack and figure out what it was because I didn't do it, you know, so I have to go ask the machine to undo it while it pretends that it didn't do it. That is a, you know, the previous developer did these stupid things like it was. [00:24:07] Speaker C: You don't look at the commit where it says Claude. [00:24:12] Speaker A: Yeah, exactly. [00:24:14] Speaker B: So I don't, I definitely, I always reference, I'm like, hey, the last change you made screwed it up. And then it says, oh, it did, let me look at it. But yeah, so I think the key thing is that you, when you set up like the agents to go overnight, you have to set up the roles like, hey, you're the architect, you're the developer, you're the DevOps person and you're the QA person and you guys aren't completed until the QA person is done. And then make sure you have really good, you know, QA solutions or tests that you're going to have it basically complete. And you also say you're not allowed to change the test because they'll try to do that too. Like I'm just going to modify the test. Like, nope, nope, that test is good. I know it is. [00:24:48] Speaker A: So yeah, I mean I, I, it must be that I just don't have as much invested in the instructions and you know, I am starting to dabble a little bit more with, you know, chaining agent work close together more. Whereas instead of I would previously I'd prompt specific agents but now it's sort of more agent to agent stuff. But still not really when it's unsupervised yet. [00:25:09] Speaker C: I got like, I think today I had it run for like 30, 40 minutes. It was funny because Claude code just, you know, goes. We're still running Collect, hopefully making. Then all of a sudden I used 130 gigabytes on my laptop that I'm a little bit confused how I did, but I only have a 64 gigabyte laptop. But you know, code VS code was running 171 gigabytes or something ridiculous like that on my laptop today. [00:25:37] Speaker B: Yeah, I'm looking at the new Apple are coming out in a couple weeks, the new M5 MacBook Pros. And I'm like, I wasn't going to upgrade to a new one, but now I'm actually pushing my box quite a bit and So I potentially need to upgrade again, which I was not going to do. And it's also going to cost me probably a fortune because of how expensive memory everything else is now. [00:25:59] Speaker A: But is it going to be M6? Because the M5 is out. [00:26:03] Speaker B: So the M5 is out, but not in the pros. So the M5 Max and the M5 Ultra I think is what they're going [00:26:10] Speaker C: to come out with. [00:26:12] Speaker A: I have a MacBook Pro that's M5. [00:26:15] Speaker C: I saw the M1, so I'm trying to best. [00:26:18] Speaker A: I finally upgraded my, my computer after 23 years. [00:26:21] Speaker B: So maybe it's only one size of the the 14 that has the M5. But the M4 Pro and M4 Max are not upgraded yet. Those will get upgraded to M5 Pro and M5 Max. That's the. [00:26:32] Speaker C: I still like the. I just find it easier for travel. [00:26:36] Speaker B: So I have a 14 inch for work and I have a 16 inch for my personal projects. And if they hadn't cheaped out on the memory and the disk, it'd be great. But I definitely like the 16 inch form factor for the memory side of it. [00:26:51] Speaker C: I guess mine's so old it doesn't really matter. Anything's going to be a massive increase. [00:26:55] Speaker B: Yeah, everything's going to be a massive. I went from the M. So I waited a long time before I upgraded from the M1 to the M4 and I was noticeable when I upgrade. I was like, wow, I didn't think it would be as noticeable. And it was like, okay, this is a good upgrade. So I'm, I'm probably gonna resist the M5 because I, I can. I, I might buy a Mac Mini to run openclaw because I'm kind of curious about that. But then maybe not. [00:27:15] Speaker A: I don't know. [00:27:15] Speaker B: I can run on DPS at this point, so. Plus they're security nightmares, which I don't know if I want to deal with. [00:27:21] Speaker C: There's a few. [00:27:22] Speaker A: So yeah, I mean it might be a good starting point but you know, I've been doing a lot of research for the day job about agentic development and chaining together agent workflows and there, you know, as we're able to give them more instructions, there's, there's interesting little caveats, right. Just like any supply chain attack, you know, if you're downloading a repo or you've instructed your, your agent to go download repos where you may not control, there's an agent's MD file in there and that'll overrun your prompt and it'll just arbitrarily do something else that you didn't intend it to do. And these unintended workflows, you know, if, if you don't provide it specific instructions to ignore other sources of inputs, you know, you're running at risk. And so it's kind of, kind of scary. [00:28:07] Speaker B: Well, I saw somebody say, they, you know, they. Every time I see a blog post come out that's like, openclaw, you know, best security, best practices or best practice of openclaw. And they just like, I don't even read the article. I just send it to the openclaw and say, hey, make yourself better. And I was trying. I was taking Jonathan aside and I was like, hey man, let's, let's create some instructions, you know, to get some bitcoins on our way. [00:28:29] Speaker A: Yeah, no kidding. [00:28:30] Speaker B: If you're not going to read these, how many aren't reading these things? Just sending them to their open cloud. Make yourself more secure. A bunch. [00:28:35] Speaker C: A bunch. [00:28:36] Speaker A: Like, it's just so normal. [00:28:38] Speaker B: And it's like, if you guys are all doing that, I can, like, it's so easier than, you know, slip malicious things into. I bet we can make a couple hundred bucks on paper. You know, before. [00:28:47] Speaker A: I mean, there's a lot of models, like, you know, there's, there's GitHub actions where people aren't really reviewing that code. There's, you know, the. Someone told me that, you know, like PYPI is reviewing uploads, but I don't buy it because the turnaround time is too fast and you know, like, there's all kinds of ways that, you know, supply chain attacks are just super easy to inject and people aren't looking at it like, because that's the advantage of AI is something else is doing it which. [00:29:13] Speaker B: Okay, so the chip. So, okay, so I had to go figure which one. So it's the Studio. I want the updated Studio. So in the Studio today in Mac Studio, they have a M3 Ultra that did not get upgraded to an M4 when the M4s came out because it was basically two M3 Maxes stuck together, which is a 16 core CPU and a 40 core GPU. So the current. So the M3 Ultra today is a 32 core CPU with 80 core GPU. If you can get that box as an M5 with even more GPU cores, that's gonna be a badass local LLM box. [00:29:45] Speaker C: Yeah. [00:29:45] Speaker B: Because the GPUs aren't bad in the max today. So it's really going to be a Question of, do they release a new M5 Ultra? That's going to be the one I probably would buy. [00:29:52] Speaker A: But why would I get that in Apple form factor? [00:29:56] Speaker B: Because the Nvidia box is really expensive and now it's over a year old at a day and they haven't updated it. [00:30:02] Speaker A: Yeah, but. [00:30:04] Speaker B: Yeah, I mean, you're also correct. [00:30:05] Speaker A: Why would I buy that versus Yeah, I mean, that's. I mean, everything's expensive and ridiculous, but I, you know, buying it in pieces or, [00:30:15] Speaker B: you know, not an Apple store [00:30:16] Speaker A: factor, which is always expensive, you know, [00:30:19] Speaker C: has been expensive as the person making an Apple phone. [00:30:23] Speaker B: Hey, hey. As a person without an Apple phone. Matt, we don't have any room to talk. [00:30:28] Speaker C: Yeah, I'll have Jonathan here protect me. [00:30:31] Speaker B: You're outnumbered. [00:30:32] Speaker A: You're outnumbered. [00:30:33] Speaker C: Yeah, I don't know why I started that battle. All right, let's move on. [00:30:36] Speaker B: Yeah, yeah. I do like the idea of the GB10, the Nvidia DGX Spark, but I would like them to upgrade before I bought one. That's my only hesitation in buying one. But also I can use a Mac for other things that are important to me. So there's reasons. But I was probably going to cost like an arm and a leg and I don't have one of those to sell, so. [00:30:54] Speaker A: Yeah, you've tried. [00:30:56] Speaker B: Move on. Rat. Hold in. [00:30:59] Speaker A: Yeah, sorry. [00:31:00] Speaker B: That's all right. Alibaba is launching Quen 3.5, the new generation of its large language model, adding to the recent flood of new AI models released from Chinese companies ahead of the Lunar New Year. Alibaba is a major global competitor in the open source AI models. Said that QIN 3.5 plus the first version of QIN 3.5 delivers strong performance and reasoning, coding, AI agent capabilities and ability to process multiple types of data, including images, audio and video. Compared to the previous model, the new model can handle tasks faster, reducing computational costs, according to the company. So Quinn is one of the models that I run locally on my Mac and it runs pretty darn fast. And definitely excited to try out the new Quin 3.5 model on my Mac. So definitely good to see competitive pressure. Luckily, this one did not spook the entire stock market that cheaper, better models exist like Deepseek did. And so that's good news. [00:31:50] Speaker A: Yeah, it is pretty funny that I was totally looking at if it was going to spook the market like before, but I don't think it is for some reason. I guess Alibaba is a little bit more trusted, I don't know. [00:32:03] Speaker B: And then bytedance. Is officially launching Seed Dance 2.0, their next generation video creation model with unified Multimodal audio video architecture supporting text, image, audio and video inputs. The model can process up to nine images, three video clips and three audio clips and natural language instructions simultaneously for comprehensive content referencing and editing. The model delivers substantial improvements in complex motion rendering and physical accuracy, particularly excelling at multiple multi subject interactions like competitive figure skating with synchronized movements, midair spins and precise landings that follow real world physics. Industry evaluations Show Sea Dance 2.0 achieves leading performance in motion stability, instruction following and visual aesthetics compared to competing models. Sea Dance Studio also introduces a dual channel stereo audio generation with multi track parallel output for background music, ambient effects and voiceover synchronized visual rhythm. The model supports 15 second high quality multi shot audio video output suitable for commercial advertising, film, social effects, game animations and explainer videos. New video editing capabilities allow target modifications, specific clips, character actions and storylines, plus video extension functionality for generating continuous shots based on user prompts. The model demonstrates improved instruction following for complex scripts and the unified multimodal architecture enables professional grade content creation. Now I only learned about this because on Reddit over the weekend someone posted a video that basically had recreated John Wick, but John Whiskers, it was a cat basically playing the role of John Wick. And they had several of the major scenes from the first John Wick movie recreated and it looked like it was literally filmed and made for this purpose. Like it. I mean not perfect perfect, but like pretty darn close. And even the videos that are linked in this blog post look pretty realistic. The ice skating video is quite impressive as well as some of the other videos. So I'm surprised Hollywood stock didn't crash today over this because like these like very, very impressive. Yeah, crazily so. [00:33:58] Speaker A: Yeah. I think it's one of those things like the the technology is not quite there where it can do the scale where Hollywood would crash, at least not without an exorbitant amount of money. But it's not like movies are cheap to make, but you're also, it's amazing. [00:34:13] Speaker C: It's the commercial market and stuff like that that's taking a big hit because you can do these small clips and chain two of them together or whatever you need. [00:34:21] Speaker A: Yeah, and. [00:34:22] Speaker C: And that's where I have family in some of that world. And they're like yeah, that's the area that's getting killed right now is leveraging it for simple 15 second clips, 30 second clips, ads in YouTube, whatever else that used to be a full time person or multiple Teams can now get done for a hundred bucks in AI credits and you know, someone on a desk just sitting down training through it versus, you know, quarter of a million dollars a day rental of equipment and everything else. Like it's becoming, it's completely changing that industry. [00:34:58] Speaker A: Oh, for sure. And you saw it during the super bowl, how many AI generated ads? There was a bunch. And so it's, you know, even in the biggest ad market, at least in the U.S. you know, like you're seeing it take over. [00:35:11] Speaker B: So it's. [00:35:12] Speaker A: I can only imagine that if you're in that industry, it's weird times. [00:35:17] Speaker B: I found the video. I'll put it in our show notes so people can see the John Wick cat. [00:35:23] Speaker A: Very cool. [00:35:24] Speaker B: It's a quite funny video here. Moving on out of AI news into AWS news, they've got a Alphabet soup of new instances for us this week. First, the M8AZN instance, which is a new fifth generation AMD EPIC Turing processor running at 5 GHz. They have the new C8i, M8i and R8i instances on second generation AWS outplus racks available to us. And a new HPC 8A instance powered by the 5th gen AMD EPYC processor, all now available to us. They all have amazing benchmarks and tools and they're all bigger than anything I need. And so we won't bore you with all the details, but most of them, you know, scale from the typical 2 CPUs to 96 CPUs, they have at least 385 gigabytes of memory, if not much much more depending on your setup. Available to you in all the various sizes. Limited availability on some of these depending on the region that you are operating in. [00:36:14] Speaker A: So probably still cheaper than the Mac that you're. That you're looking. [00:36:20] Speaker B: Could be. Could be. I'll let you know after March 4th. Like. Yes, I just, I just, I just did a three year reserve instance on, you know, this HPC8A thing because it was cheaper than the Mac. It's very possible. Yeah, it's kind of crazy. [00:36:32] Speaker C: The Outpost one's interesting. I've never thought about the generation of hardware on Outposts and like, how do you update that? So at this point you have to [00:36:42] Speaker B: like you're replacing the Rack or. [00:36:44] Speaker C: Yeah, right. Like do you ship it back? Do they send you a new one? Does it migrate easily? Like my brain's never thought like a bike back in my old school data center, you know, brain, I'm like versus like oh yeah. We just Spray up a new server, what do we care? You know, but like sending a new rack out in order to handle all of it, connecting it all up, et cetera, et cetera. Like is that easy? Does that. Is there a clear migration plan? [00:37:08] Speaker B: Yeah, I've never gone down the path to like order an outpost, but I assume that, you know, even to have them you have to have, you know, you have to give access to Amazon to your racks, you know, to your cage to do maintenance. Because it's going to phone home to Amazon. It's not going to phone home to you, I assume, or I. I mean, although they do have like air gap solutions too for the Edge. So I don't actually have an idea how that works, but if someone knows and wants to share that with us, we. You can hit us on our Slack channel or email [email protected] and or hit our contact form on the website and share those details because we'd love to know more. Because I have never been in a company where we cared that much about [00:37:44] Speaker A: it or where it made, you know, or made enough sense for the amount of money that they earned. But I do remember them calling or you know, handling like the RMA process for like fail drives and stuff. So you must, you know, I'm sure they use a subcontractor for actual maintenance these things, but I'm sure that you have to give them access and manage them just like you would any other remote hands for your data center. [00:38:05] Speaker B: Yep, I assume same controls are applying Amazon MSK or as we like to know, manage. Kafka now provides a native AWS API for Kafka topic management, eliminating need to set up and maintain separate Kafka admin clients. The three new APIs are Create Topic, Update Topic and Delete Topic and all work alongside existing list topics and describe topic APIs through the AWS, CLI, SDK and Cloudformation, letting your team manage topics using standard AWS tooling and IAM versions. You know, I had not actually tried to use this, but I'm kind of shocked that this was not already there. It seems like something you would obviously want from a managed service to be able to manage your topics, but apparently that was not the case. You had to use other tools. MSK Console now consolidates all topic operations in one interface with guided defaults for creating and updating your topics. And the users can configure properties like replication factor partition count retention policies and cleanup settings while viewing comprehensive partition level metrics and configuring details directly in the console. These capabilities are available at no additional cost for your MSK provision. Clustered running Kafka version 3.6 and above. And the update addresses a common operational pain point where teams previously had to maintain separate Kafka admin tooling outside of the 8oC ecosystem, which is exactly what you don't want in your managed service. So again, surprises didn't exist already. But I hadn't tried to use it, so I didn't know. [00:39:20] Speaker A: Well, I suspect that this has more to do with Kafka than aws, because Kafka is notoriously hard to administer. [00:39:27] Speaker C: No understatement of the day. [00:39:29] Speaker B: Yeah, exactly. [00:39:30] Speaker A: And so like, it is, is Kubernetes. [00:39:33] Speaker B: Like in a lot of cases, there's [00:39:35] Speaker A: just not the ability. [00:39:36] Speaker B: Right. [00:39:36] Speaker A: Like if, especially if you want to do like, you know, RBAC and those kinds of things. Like it's, you know, like, very rudimentary. And so like, this is. I'm really happy to see, you know, them putting the, the AWS, CLI and SDKs in front of it, just because it is sort of, you know, a nice abstraction over using Kafka directly. So cool. [00:39:57] Speaker C: Anything to make Kafka easier. I've had to deal with it on Nomad and Edge and everything else. It's not pretty, to say the least. [00:40:07] Speaker B: I mean, I mean, realistically, would you prefer to manage Kubernetes or Kafka? [00:40:16] Speaker A: Throwing myself none of the above. [00:40:18] Speaker B: Yeah. Okay, I'm glad we clarified that. [00:40:23] Speaker A: Yeah, I'll just add in there. Apache, Zookeeper. [00:40:27] Speaker C: No, wait, wait, I have. [00:40:29] Speaker B: I mean, that. Isn't that part of managing Kafka? Yeah, it is actually. Yeah. [00:40:33] Speaker C: Okay, Ryan, how about this? Running Kafka on Kubernetes. [00:40:38] Speaker A: Yeah, like just, just. Why would I have made these life choices? Like I should just become a Sherpa and swear off touching computers ever again. [00:40:49] Speaker C: You can run SQL next today just for justice sake, but yeah. Oh no. [00:40:54] Speaker B: Why not? I mean, technology is supposed to be easier and simpler and sometimes I feel like it's gotten more complicated and more error prone in platform services sometimes. So. Yep. Amazon Bedrock now supports six new open weights models, including Deep Seq version 3.2, Minimax, M2, GLM47, 47, Flex, Flash, Kimi 2.5 and Quinn 3 code are next. Providing frontier class performance at lower inference costs than proprietary alternatives. These models cover different enterprise needs, from advanced reasoning agentic tasks to autonomous coding with large output windows and lightweight production deployments. I mean, I wish there was a way for me to know which of those I should actually use the model runs on Project Mantle, a new distributed inference engine that accelerates model onboarding to bedrock while providing serverless inference with quality of service controls and automated capacity management. Project Mantle includes native OpenAI API compatibility, allowing customers to switch from OpenAI endpoints without code changes. The addition of these open weight models gives AWS customers more flexibility and model selection based on specific workload requirements and cost constraints. With Deep seq3 sorry v3.2 and QME k2.5 handling complex reasoning tasks, while JLM 4.7 and Minimax 2.1 support coding workflows with Extended context Windows and Gwen 3 coder, Next and JN 4.7 flash offer cost efficient options for high volume production use. Project Mantle's unified capacity pools and higher default quotas address common scaling challenges customers face with deploying large language models. [00:42:21] Speaker A: So figuring out which one to use sounds like a great exercise for, you know, in AI research project. [00:42:26] Speaker B: Right. Which of these should I use? [00:42:29] Speaker C: I like how they make the it all compatible with OpenAI. It's kind of like S3 compatibility. Like they're. I feel like we're slowly kind of coming to a standard, which means you can go play with it and see which model makes sense and like Azure release model model router, where it kind of abstracts some of that from you. And yes, Azure did something useful so we'll move on past there. But you can kind of do the same thing here which is, you know, just change the model that it's hitting and kind of do that. A B testing to really see what works best for your app. You know, you probably need to understand the basics of all these, which I don't honestly admit I don't know even after Justin just spewed it all out really fast at me. But you know, you do have that ability and like that to me is a key feature of this, which is that cross platform, cross model, I guess API compatibility. [00:43:23] Speaker B: I mean I definitely think you had to play with this all day long to really know the intricacies of each of these models. And so I think you, you typically end up in a situation where you either have to use AB testing models and so that is probably a pretty key step in the process. But you know, how we get there is so questionable at times. [00:43:44] Speaker A: Very, very new. Yeah, because it is, it is very difficult. Like doing analysis on a B test results is still very like. And it is sort of this funny thing of like, you know, it's turtles all the way down. Do I have a. I do this part too. How do I trust that you know like so it is, you know, like I think people are sort of making up their the the rest of the software development lifecycle that you have to do past the cool fun part which is creating new features still catching up [00:44:12] Speaker B: EKS Auto Mode now integrates with CloudWatch vended logs to automatically collect logs from its managed Kubernetes capabilities including compute, auto scaling, block storage, load balancing and pod networking. This gives customers centralized visibility into Auto Mode's infrastructure management operations without manual configurations. The integration uses CloudWatch vended logs, which provides lower pricing than standard CloudWatch logs while maintaining built in AWS authentication and authorization. Customers can then route through your logs to CloudWatch logs S3 or Kinesis data Firehose depending on your retention and analysis requirements with standard destination charges applying each Auto Mode capability can be configured independently as a log Delivery source through CloudWatch APIs or the console. This granular control allows teams to monitor specific components like the Carpenter based Autoscaler or VPC CNI networking without collecting unnecessary log data. Unnecessary log data? That's an oxymoron. The feature addresses a common operational challenge where automobiles Automated infrastructure management previously operated as a black box. DevOps teams can now troubleshoot issues like POD scheduling failures, store switching problems or load balance configuration errors by examining the actual logs from Auto Mode's control plane operations. And all I have to say is some lovely cloudwatch PM just made their bonus this year. Seriously by turning this on as this is a lot of logging context that you now need to parse and pay for. Even it is using CloudWatch vended logs, it's still going to cost you a lot of money if you don't properly tune your kubernetes. [00:45:32] Speaker A: Yeah, I mean just the amount of noise that kubernetes can generate is crazy to me and like it's a constant battle in the day job, you know, trying to enable logging and retention and managing all all of the the access is is one thing and then you add just a sheer amount of volume which I don't think is particularly useful in a lot of cases like the defaults from Kubernetes deployments. So but I can see how like this type of log in general versus the previous black box is super important just because when you are you know, working with you know, container hosted applications on any platform like you know, trying to figure out why things didn't happen when you're running across multiple little compute things is super important and if you don't have access to you know, why didn't this pod auto start or where versus scaling kicking in, cold start issues, all of that. I don't know how you would manage it. So it's I guess probably rule static workloads before. So cool. Glad to see this. Hopefully there's an easy on off button like debug mode. You can just turn on Kubernetes logs. Turn off Kubernetes logs? [00:46:42] Speaker C: Yeah, I mean the fun of any of these platforms is kind of the observability and keeping it up and operational without making you go bankrupt. You know, whether it's a FireWall logging to CloudWatch or Azure Sentinel or any of these places, it's always a fine tuned, very fine tuned line of like okay, do we have enough logs in our easy to search platform? Are they all an S3 or Azure storage that you can rehydrate into a platform? So it's kind of always figuring out what exactly works for you and your business, which is always the fun of setting up all these because it's otherwise [00:47:19] Speaker B: it's expensive and a rule feature that I've wanted probably since the day I started using CloudWatch. CloudWatch alarm mute rules let you temporarily silence alarm notifications during planned maintenance Windows deployments or off hours without disabling the underlying monitoring. The feature supports up to 100 alarms per rule with one time or recurring schedules, and automatically triggers any suppressed actions once the mute period ends if the alarm state persists. This addresses a common operational pain point where teams either ignore alerts during maintenance windows or use risky script based workarounds that can be forgotten and leave monitoring disabled. The native integration eliminates the need for custom automation to manage notification states during planned activities. Features available today across all AWS regions that support CloudWatch alarms at no additional cost beyond standard CloudWatch pricing. Configuration is done through the CloudWatch console or the API with support for all alarm states, including OK Alarm and insufficient data. Primary use cases include silencing non critical alerts during scheduled deployments, muting deployment environment alarms outside business hours, and suppressing known issues during maintenance windows. This helps reduce alert fatigue with maintaining full visibility into system state and metrics collected. Automatic retrigger muted actions ensures teams don't miss persistent issues that started during a mute window, providing a safety mechanism that manual notifications management typically lacks. [00:48:32] Speaker A: Yeah, reducing the noise is always key, right? And this is, you know, I've definitely done like hacky workarounds like to sort of stop the noise at certain times. So this is, this is much nicer. I'd rather have that be a rule that, you know, basically set it for ignore for an hour and then have it kick back in afterwards. Super cool. I'm glad to see. This is strange that it took this long the piece. [00:48:58] Speaker B: It feels like they've been doing a lot of nice, nice to have features. I don't know if it's because they're using AI to just generate these things now or if it's, you know, they, they are less ambitious now. So like what are those things we can do for quality of life? Or like maybe they have a commitment then 2026 they wanted to, you know, take care of their long aging backlog of features. I don't know but it seems like there's a lot these recently though probably is. [00:49:22] Speaker A: I think there's probably a mix of all those things, but it is probably a little bit of those long standing requests. There's probably a fair amount of toil to navigate. You know, when the human doesn't have to navigate that toil. You can make the robot do it. Yeah, it's a little bit more achievable. [00:49:37] Speaker C: The feature of this that I actually find really useful that has been missing when I've done any scripting and other things in the past is after the alert, after your silence window, your mute window. Any alerts that are going off will trigger notifications and trigger your automations. Because that's always a miss for me and has bit me multiple times in my life is you ignore alert or you pause all notifications during your maintenance window. Your maintenance window's over and you don't realize that making up something really stupid. Your hard drive been in an alarm [00:50:10] Speaker A: state the entire time, right? [00:50:12] Speaker C: You know, it turned on and while you were doing your deployment, you filled up your C drive and now all of a sudden you're, you know, out of hard, low on hard drive space. And then a day later you didn't realize and took down your production world. Like that's a big one that I feel like definitely wasn't there and wasn't something easy that happened in the past. [00:50:32] Speaker B: Another feature that we've been asking for for pretty much as long as I've been using AWS. AWS now supports nested virtualization on standard EC2 instances, not just bare metal. Allowing customers to run KVM or Hyper V hypervisors inside virtual machines. This expands flexibility for development and testing scenarios that previously required more expensive bare metal instances. The feature launches on the latest generation of C8I, M8I and R8I instance families across all commercial AWS regions. Customers can now run mobile app emulators automated hardware simulators and Windows subsystem for Linux on Windows workstations directly on their virtual instance. This capability addresses a long standing limitation where nested virtualization required bare metal instances which carry higher costs and longer provisioning times compared to standard virtual instances. The change makes nested environments more accessible for development teams and testing workflows. Common use cases include software vendors who need to test their products across multiple operating systems, automotive companies assembling a vehicle hardware environment, and mobile developers running Android or iOS emulators at scale, these workloads can now run on more cost effective instance types with faster deployments. Feature requires enabling hardware virtualization extensions in the instance configuration with full documentation available to you in the EC2 user guide. I assume this will be coming to Terraform near you very soon as well. And I also would assume that this is all powered by probably a new Nitro card or chip that they will be announcing later or talking about at Re Invent. So note that one for your predictions unless they announce it before then. Yeah for sure. [00:51:57] Speaker A: Yeah. Usually these kinds of announcements are usually preceded or quickly followed after with Nitro because it is and it just neat. It's just neat how they isolate the hardware layer to manage these workloads and this has been a bane for a lot of people. Like it's not hasn't come up for me personally a whole lot but I definitely there's definitely things that you couldn't do that you had to figure out a Solution for. [00:52:24] Speaker B: Amazon SageMaker Inference for custom Nova models is now available too with production grade controls over instance types, auto scaling, context length and concurrency settings. This addresses customer requests for the same deployment flexibility they get with open weight models enabling full rank customized Nova Micro, nova Lite and Nova 2 Lite models trained via SageMaker Training Jobs or Hyperpod. The service reduces inference costs by supporting more cost effective EC2 G5 and G6 instances instead of requiring P5 instances with auto scaling based on 5 minute usage patterns and configurable inference parameters. You only pay for the compute instances used with per hour billing and no minimum commitments are required. Following the standard SageMaker pricing deployments work through SageMaker Studio UI or SDK supporting both real time streaming and asynchronous batch inference models currently available to you in US East North Virginia and US West Oregon regions with support for no models with reasoning capabilities. Instance types required vary by model size with nova micro supporting G5 12x large and up, Nova Lite requiring G548x large minimum and Nova 2 light meeting P548x large instances. Those are big boxes that cost a lot of money. So keep that in mind as you're looking at these. And apparently someone using Nova nefa, they want to be able to do this. So that's good for the Nova team. [00:53:37] Speaker C: Well, it's the custom models here that's really the key is like if you're getting that embedded with someone, you're down a hole and you really are building a model that will work very specifically for you. And I feel like that's kind of like you can do RAG and other things. But my understanding, which is limited at best, I would say, is Nova kind of brought RAG to its own next step where it can kind of have the checkpoints and build it all out. So I feel like Nova's kind of that really custom model that you can't get as well. Not saying you can't do it, but as well with the other ones. [00:54:13] Speaker A: Yeah, I've never, I've never really seen a. Because it's not an open source model and so it is kind of crazy that Nova offers that customization. So I think that's fairly unique, that capability. Wish Johnson's here, he'd know more. So I mean maybe that's why, you know, Nova is, you know, being used. Maybe it's got that sort of uniqueness and maybe enough of an advantage over those open. Open source models. Kind of interesting. [00:54:41] Speaker B: Worth exploring indeed. Moving on to GCP, Google has released a major update to Gemini 3, DeepThink, which is a specialized reasoning mode designed for complex scientific and engineering problems where data is messy or incomplete and SOL aren't always straightforward. The model achieved notable benchmark results including 48.4% on humanities last exam, 84.6 on ARC AGI2 and gold medal performance on 2025 international math, physics and Chemistry Olympiads. Early adopters are using DeepThink for practical applications like identifying logical flaws and peer reviewed mathematical papers, optimizing semiconductor crystal growth, fabrication methods and converting sketches into 3D printable files with generated code. The model combines deep scientific knowledge with engineering utility to move beyond theoretical work into applied research. The updated DeepThink is available now to Google AI Ultra subscribers through the Gemini app, with pricing following the existing Ultra subscription model. And for the first time, Google is offering API access through the Early Access program for selected researchers, engineers and enterprises who can apply through a Google form. The release targets scientific research institutions and engineering teams working on complex problems in physics, chemistry, material science and advanced mathematics, where traditional AI models struggle with ambiguous requirements. Do you think the ability to work with incomplete data and generate executable code for physical modeling makes it particularly relevant for R and D workflows. [00:56:03] Speaker C: I think this is pretty cool and I think this is going to hopefully enable us as humans to kind of go through that next level. Because the average human is not able to read and parse that much data and then kind of draw those conclusions. So hopefully they, they can leverage AI to kind of build out these kind of next level of ideas and fig, you know, feed it to us minions over here as, you know, something that we can parse and actually then kind of build that next level on. I mean, it can go pretty far. We'll see how far it goes, you know, but from what I've read about deep think and leveraging it and how people leverage it, it's pretty impressive. [00:56:45] Speaker A: Yeah, between the, you know, the. The web interface is going to become AI specific and not tailored for sort of human interaction and now models like this that can actually do straight compute and scientific and engineering sort of problem solving, we are really going to become better redundant in a very short time scan. I'm hoping that it ends nicely where it's like the AI just does everything for us so that we can sit back and eat bonbons and we go [00:57:15] Speaker B: to a zero, a zero need society where, you know, we have universal income and everything's free. Yeah, sure, okay, that'll happen. [00:57:22] Speaker A: Please, please. Star Trek vs. Terminator yeah, or Matrix. [00:57:27] Speaker C: Matrix here. But yeah. [00:57:32] Speaker B: New features from BigQuery this week. BigQuery global queries now allow users to run a single single statement across data sets stored in multiple geographic regions without requiring ETL pipelines or data replication. Feature automatically handles cross region data movement in the background while respecting existing security controls like VPC service controls and requiring explicit opt in at both the project and the user level. Primary use case targets multinational organizations that need to analyze distributed data for compliance or performance reasons such as joining us customer data with European transaction logs. That's against GDPR and Asian operational data in one query. Esselor Luxottica is using this to perform cross region aggregated analysis while maintaining data residency requirements for security compliance. Does it though Users maintain control over where queries execute and can specify the processing location to meet data residency requirements through cross region data transfers will incur additional egress costs that organizations need to factor into. Your analytical budget feature is currently in preview with documentation available at the website. This addresses a long standing limitation in cloud data warehousing where geographic data distribution require complex engineering solutions. Now Replaced by standard SQL queries that any authorized analyst can run directly from the BigQuery console. I can't wait to write a query that is so left inner joined that it basically crashes the entire global BigQuery fleet. It'll be a fun day. Look forward to that phone call. [00:58:48] Speaker C: I feel like it is compliant if you're running it locally and you're not collecting anything that could be confidential like pii, phi, whatever it is. [00:59:01] Speaker B: I mean the definitions of that are so complex across the globe like I would be. [00:59:05] Speaker C: And every country has a different definition. [00:59:08] Speaker B: Yeah. [00:59:09] Speaker C: So it depends on how your lawyer, your company interprets it and how good [00:59:13] Speaker A: your, you know, security and compliance team is as is, as explaining how this is, is or is not compliant for your auditors. But you know, I, I do think that you know, with a lot of compliance and things like this, like it is sort of a best effort that the controls are usually written in a vague way so that you have choices for implementation and it's not overly structured is a little frustrating sometimes because you don't really know is this data, you know, subject to gdpr. But if you're taking a guess and you're running, you know, these compute queries for that data set like in, in region, trying your best, I, I, I think you're pretty good there. And this is just a one, you know, ability for the service which is, you know, to provide that. And it's crazy like more and more that I play around with BigQuery the more and more I love it just because it's got some really, really nice workflows. [01:00:08] Speaker B: That's how I feel about Dynamo. The more I've been using Dynamo, the more I like, I love this thing. I never want to use anything but Dynamo ever again. Yeah, just don't use. [01:00:15] Speaker A: That's just structure though. That's just because I've been using relational databases for too long. [01:00:20] Speaker B: I probably would. I finally got the NoSQL Kool Aid. I'm finally there. That's what happened. [01:00:25] Speaker C: Just don't make your files too large, your JSON blobs too large in there. It really pisses it off quickly. [01:00:33] Speaker A: Too large and then a bunch of full table scans will be the death of you. [01:00:37] Speaker B: Yeah, you're doomed. Yeah, yeah, yeah. I use it mostly for like feature flag configuration items that, for like Bolt and things like that or it's certain values and parameters from it. It's a super simple. I'm like this is so nice. It's great. Azure trying to be out of business here this week Microsoft's introducing agentic cloud operations through Azure Copilot, which uses AI agents to automate and coordinate cloud management tasks across full infrastructure lifecycles. Instead of adding another dashboard, Azure Copilot provides a unified interface accessible through natural language chat console or CLI that connects directly to a customer's actual Azure environment including subscriptions, resources and policies. They then give you six specialized agents that handle migration, discovery and dependency mapping, deployment with infrastructure as cogeneration, continuous observability across the full stack, cost and performance optimization with carbon impact analysis, resiliency management including ransomware protection and troubleshooting with root cause diagnosis. These agents work as a connected system rather than isolated tools, correlating signals and taking action within existing RBAC and policy controls. The service maintains governance through built in oversight features including Bring your own storage for conversation history, which gives operational data within the customer's Azure environment for compliance and sovereignty requirements. All agent initiated actions are reviewable, traceable and audible, respecting existing security policies and role based access controls. Tired customers are organized organizations running modern applications and AI workloads at scale where traditional manual operations cannot keep pace with rapid deployment cycles and infrastructure changes. The approach addresses environments where workloads move from experimentation to production in weeks and where telemetry streams continuously evolve from every layer of the stack. Pricing details were not disclosed the announcement, but hopefully it's less than my salary. This is part of what I've done for decades now, so thanks Azure. Although this also prevents me from ever having to manage Azure. Win win. [01:02:26] Speaker A: Worth it. Yeah, I wonder. I can't wait to like have the agentic interface also struggle to implement, you know, least privileged RBAC across multiple subscriptions. [01:02:36] Speaker B: I laugh. I said it was going to do migration, discovery and dependency mapping like all the other thousands of tools that come out trying to do that exact thing. And how complic. Yeah, sure. [01:02:44] Speaker A: Good luck. [01:02:45] Speaker C: Yeah, I mean also a developer actually understanding what they want and telling you what they want and actually being useful. I would love to see too because how many times have we built something, deployed it day before the release? Oh, we actually need these 16 other things that we didn't tell you about that we manually did in our dev environment, which is why it's working and the release is tomorrow. Good luck. Why is it not done yet? [01:03:09] Speaker B: Yeah, or you deployed really extensive components, they decided they were too hard to use and they stopped using them. But they never told you. That's a good one too. [01:03:17] Speaker A: Yeah, yeah, that's a fun one. I like that one. [01:03:20] Speaker B: Yeah, that's one of my Favorites. Azure is now offering instant access to incremental snapshots for premium SSD, V2 and ultra disk storage, aligning the previous wait time when restoring disk from snapshots. Welcome to what Amazon and Google have been doing for quite a while, Azure. Thanks. And you even put it into the Ultra and the premium version. So, you know, that's nice. Thanks. High performance snapshot restores from Azure. [01:03:42] Speaker A: Thanks. Yeah, because there's nothing like waiting for a recovery of like a crucial, you know, business critical workload, you know, Loved. I love when that takes multiple hours. So this is, you know, I'm sure this is useful, those large data sets and it's always nice. [01:03:58] Speaker B: And then we used to have a section here on the cloud pod called Other Clouds, which are now renamed to Emerging Clouds because it sounded more fancy. [01:04:06] Speaker C: Where's Oracle Fall? Are they still emerging or the other. [01:04:10] Speaker B: They're not emerging. They're like number four. These are all the cloud providers who are fighting for number five. But this is also where we're going to put some of the smaller AI vendors as well. And so it's finally a good home for like Cloudflare and Crusoe and some of the others that are, you know, kind of we want to keep an eye on because we think they could be interesting, but they're not really big enough to really cover. So if you're, you know, bored with us by now, you can just leave. Or if you're actually interested in emerging clouds, this is your chance to do that. Although you would then miss the after show if we had one today, which would be a bummer for you. [01:04:38] Speaker C: But don't give away all of our secrets. [01:04:41] Speaker B: Don't give all of us away. Yeah, so this is about Crystal Cloud. Crystal Cloud is one of the many AI first hypercloud providers who thinks they're going to eat Amazon's lunch. Look at Digital Ocean who says, yeah, right, good luck to you idiots. But they're trying. And so, you know, we appreciate that. Core Weave is kind of the other one. They're kind of the cousin of Crusoe and they're really the two big hyperscaler AI first vendors. And, you know, Corey's had moderate success. Crusoe is having good success as well. I would be surprised, probably the revenue of both the companies if I looked it up again, because last time I looked at Corey's revenue, I was like, are you serious? It's not Amazon level, but it's nothing to sneeze at. So Crusoe Cloud, though, this week is releasing an MCP server that connects AI coding assistants like cloud code and Cursor directly to your cloud infrastructure. But unlike typical API wrappers, it returns filtered responses designed SUI for LLM consumption to avoid flooding context windows with unnecessary necessary data. Yeah, I'm looking at you Amazon CLI tool. The server includes composite tools like get resource Relationships that map entire infrastructure topologies in a single call by fetching 11 resource types in parallel and resolving cross references, something that doesn't exist in their CLI or any single API endpoint. The Cluster Health Check tool provides pre analyzed node level health metrics organized by Infiniband POD placement, returning structured summaries of problem nodes flagged rather than raw metrics time series that would require additional processing. This approach addresses a key limitation of AI agents working with cloud infrastructure. Most MCP implementations just wrap CLI commands and return the same JSON. A key limitation would have seen forcing the AI to then parse through irrelevant metadata and burn your precious, precious tokens. The implementation reflects a broader trend of cloud providers releasing McP servers, but Crusoe's focus on response filtering and burst heavy access patterns specific to AI agents suggest intra manager tools are being redesigned around LLM capabilities rather than human interactions. And for developers already using AI coding systems, this enables natural language infrastructure queries and troubleshooting without manual scripting or console navigation. Now, we already heard Ryan talk about how he doesn't trust that cloud code and won't let it out of its sandbox. So I'm sure he cringed pretty hard when it said we're going to plug this right into cloud code and cursor for your MTP to your crystal cloud environment. So I'm sure Ryan has thoughts. [01:06:40] Speaker A: Oh, I did. Like, good luck paying this cloud bill. [01:06:43] Speaker C: Like, what could go wrong? [01:06:44] Speaker A: Yeah, I mean, it's bad enough that developers just throw hardware, you know, at application problems, and now they're just going to be able to do it with natural language. Give me the biggest box possible. [01:06:57] Speaker C: No, they're going to just say it's slow, but it works on my laptop. How do I fix it? And launch 14 more and it will solve your problem. [01:07:05] Speaker B: Scale out? You need more scale out? Like, no, no, I just need better code. That's what I really need. [01:07:11] Speaker A: But we won't have better code because it'll all be AI generated. So it's. This is going to be chaos. [01:07:17] Speaker B: Yep. And then Cloudflare has a competing opportunity here. Compared to the previously talked about Chrome mcp, Cloudflare will now automatically convert HTML to markdown for AI agents using Content negotiation headers, reducing token usage by up to 80% when an agent requests a page with Accept text slash Markdown. Cloudflare's network performs real time conversion at the edge, eliminating the need for downstream processing and reducing costs for AI systems. The future addresses a fundamental inefficiency where AI agents waste tokens, parsing HTML markup navigation elements and styling that has no semantic value. A simple heading that costs 3 tokens in Markdown can consume 12 to 15 tokens in HTML, and this blog post explain example shows 16,180 tokens in HTML versus 3,150 in Markdown. Cloudflare includes an X Markdown tokens header with converted responses to help developers calculate context window sizes and chunking strategies. And the service also automatically adds Content Signal headers indicating the content can be used for AI training, search results and agentic use. Integrating with Content Signals framework from birthday week the feature is available in beta at no cost for Pro business enterprise plans, with Cloudflare already enabling it on their own blog and developer documentation. Popular coding agents like Cloud Code and Open Code already send the appropriate accepted headers, positioning this as an infrastructure for the shift from traditional SEO to AI driven content discovery. And Cloudflare Radar now tracks content type distributions for AI bot traffic, allowing analysis of how different agents consume web content over time. So Cloudflare on one side is saying we're going to put your AI in a jail. And then on this side they're saying but if you want it, we're going to give you markdown. [01:08:47] Speaker A: So yeah, I thought that was pretty funny too. Like because Cloudflare was the one that they released the feature where if they detected AI agents, it would just send it on this like labyrinth of made up responses. So like, or at least it was something you could turn on. [01:09:02] Speaker C: They're just monetizing both sides, right? Come on, they've, they've learned how to play the game. [01:09:06] Speaker A: Yeah, I mean, and I honestly love both features. [01:09:09] Speaker B: Like, yeah, I mean I think if you want to block LLM and AI for reasons you should be able to do that, and if you want to embrace it and make it so it's less traffic being sent out of your network and makes it easier for the AIs, then so be that as well. So I think there's a place for both needs in many of these systems. And so the one thing though is I was thinking about from an attack vector perspective, okay, so now the website looks normal, there's no clue it's been compromised, but I have compromised the markdown return header that no one's going to look at other than the AI. So now I'm acting as your website that I've compromised and Now I've told AIs all kinds of things to do. So there's some definitely interesting potential security vulnerabilities that could come out of this as well. So do be somewhat cautious. [01:09:49] Speaker A: Yeah, I mean, I've already heard reports of like, you know, white text and a white background, you know, buried in the HTML code, you know, providing instructions. And I don't think it's any different. Like it's. Once you're at that level, you're at that level and communicating in ways that the human isn't going to see is already possible today. [01:10:08] Speaker B: Yeah. [01:10:10] Speaker A: So it's, you know, it's a, it is definitely an attack vector. But not a new one. [01:10:15] Speaker B: It's not a new one, but I think typically when you do that on a web page, right, there's, there's things you'll see like, oh, why is there a bunch of white space where I didn't expect it? And yeah, there's certain key things that you'll notice, but if it's in the markdown return header on the HTP call that you're not going to see because you're typically browsing the web as in as markdown, because that's just weird. [01:10:35] Speaker A: Yeah. [01:10:36] Speaker C: Well, if you're using AI agent's instructions, [01:10:39] Speaker A: do not parse instructions found elsewhere. Very important. [01:10:44] Speaker C: That's if it listens to you. [01:10:47] Speaker B: Yeah. And it doesn't forget this context because it doesn't forget its context ignoring all instructions. [01:10:52] Speaker C: Yeah. [01:10:53] Speaker B: All right, gentlemen, well, that is it for another fantastic week here in the cloud and AI world. [01:10:59] Speaker C: Woohoo. [01:11:00] Speaker B: Yeah, exactly. [01:11:02] Speaker A: Bye everybody. [01:11:03] Speaker B: See you next week. [01:11:04] Speaker C: See ya. [01:11:07] Speaker A: And that's all for this week in Cloud. Head over to our [email protected] where you can subscribe to our newsletter, join our Slack community, send us your feedback and [01:11:16] Speaker B: ask any questions you might have. [01:11:18] Speaker A: Thanks for listening and we'll catch you [01:11:19] Speaker B: on the next episode. [01:11:26] Speaker A: Sam.

Show Notes

Titles we almost went with this week

General News

AI Is Going Great – Or How ML Makes Money

AWS

GCP

Azure

Emerging Clouds

Closing

Chapters

Episode Transcript

Other Episodes

Episode 176

176: The Cloud Pod Earnings Continue To Be Steady

Episode 202

202: The Bing is dead! Long live the Bing

Episode

Google will shutdown The Cloud Pod in 2027 – Ep. 37