347: The CloudPod is Only Recording this Week “Because of AI”

Welcome to episode 347 of The Cloud Pod, where the forecast is always cloudy! Justin, Jonathan, and Ryan are in the studio recording today, and thankfully, Jonathan hasn’t replaced us all with Skynet – yet. This week, we’re discussing how old our tools (and us) are (hint: it’s really old), whether or not the SaasApocalypse is upon us, and whether or not the business or AI is responsible for the latest round of layoffs.

Titles we almost went with this week

S3 Bucket Names Finally Stop Being a Global Hunger Games
One Million Tokens Walk Into a Context Window
SLO Down and Smell the Reliability Metrics
CloudWatch Finally Watches Your Whole Cloud Organization
S3 Turns 20 and Still Buckets the Competition
Azure SRE Agent Goes GA So You Don’t Have To
Twenty Years of S3 and No Signs of Object Permanence
One Rule to Monitor Them All Across AWS
One Flag to Secure Them All on Cloud Run
SaaSpocalypse Now Atlassian Layoffs Hit the Jira
No More Bucket Name Bingo with S3 Regional Namespaces
A Picture Is Worth a Thousand Claude Tokens
One Command to Rule Your Autonomous AI Agents
AI Fixes Your Incidents Before Your Boss Notices
The CloudPod is only recording this week “Because of AI”
Amazon begs users to leave Simple DB with another migration tool

Follow Up

00:54 Microsoft’s brief in Anthropic case shows new alliance and willingness to challenge Trump administration

Microsoft filed an amicus brief in Anthropic’s lawsuit against the U.S. Department of War, urging a federal judge to temporarily block the Pentagon’s designation of Anthropic as a supply chain risk, citing substantial costs to government contractors that rely on Anthropic models.
The brief arrived one day after Microsoft launched Copilot Cowork, built on Anthropic’s Claude, and four months after Microsoft committed up to $5 billion in Anthropic as part of a deal requiring Anthropic to spend at least $30 billion on Azure, making the legal filing directly tied to concrete commercial dependencies.
Microsoft highlighted a procedural inconsistency in the government’s approach: the Pentagon gave itself six months to transition off Anthropic’s models while making the supply chain designation effective immediately for contractors, creating an unequal compliance burden.
Amazon, which has invested $8 billion in Anthropic, has not publicly responded to the lawsuit or the designation, creating a notable contrast in how two major cloud providers with similar financial exposure are handling the situation.
OpenAI announced its own Pentagon deal on the same day the Anthropic designation was issued, and 37 researchers from OpenAI and Google separately filed an amicus brief supporting Anthropic, indicating the case is drawing broad attention across the AI and cloud industry with potential implications for how AI guardrails are treated in government contracts.

01:37 Justin – “Oh, yeah, there’s a vested interest in the lawsuit which we did not mention last week, so I wanted to follow up on that, because that explains very clearly why Microsoft is throwing in with Anthropic on this.”

General News

02:37 Atlassian to shed ten percent of staff, because of AI

Atlassian is cutting roughly 1,600 employees, about 10 percent of its workforce, citing AI-driven changes to required skill sets and a need to self-fund further AI and enterprise sales investment.
The company’s market cap has dropped from a peak of around 112 billion dollars in 2021 to approximately 20 billion dollars today, providing financial context for why cost restructuring is happening alongside the AI narrative.
The SaaSpocalypse concept is worth discussing here, as Atlassian is among the SaaS vendors analysts flag as potentially vulnerable to organizations replacing traditional tools with AI-generated or vibe-coded alternatives.
Atlassian points to 25 percent cloud revenue growth, 600 customers spending over 1 million dollars annually, and 5 million users on its Rovo AI suite as indicators that the business is still growing, which creates an interesting tension with the layoff announcement.
For cloud practitioners, this is a concrete example of how AI adoption is beginning to visibly reshape headcount decisions at established SaaS vendors, not just startups, which has implications for how enterprises evaluate vendor stability and long-term support commitments.

03:18 Justin – “I’ve seen Rovo, which is Atlassian’s AI suite, and if that’s the best they can do… I have fears for the long-term health and viability of Jira in general. I’m kind of over the whole let’s blame AI for our bad business decisions. That’s going to get old real quick.”

AI Is Going Great – Or How ML Makes Money

06:18 Claude builds interactive visuals right in your conversation

Anthropic has launched in beta a new inline visualization feature for Claude that generates interactive charts, diagrams, and other visuals directly within chat conversations, available across all plan tiers at no additional cost.
These visuals are distinct from Claude’s existing artifacts system in a notable way: they are temporary and contextual, appearing inline rather than in a side panel, and they update or disappear as the conversation evolves rather than serving as persistent shareable documents.
Claude determines autonomously when a visual would aid comprehension, but users can also prompt it directly with natural language requests like “draw this as a diagram” or “visualize how this might change over time,” and can request adjustments iteratively within the same conversation.
The feature is part of a broader set of response format improvements Anthropic has been rolling out, including purpose-built layouts for recipes and weather queries, as well as direct in-conversation integrations with third-party tools like Figma, Canva, and Slack.
For developers and enterprise users, the practical implication is that Claude can now serve as a lightweight data visualization layer within workflows without requiring users to export data to separate charting tools, which could reduce friction in analytical and educational use cases.

07:27 Ryan – “Kind of excited when Claude decides that the monkey making the queries needs bigger pictures because the text isn’t working out, so it’s like, I get you, Claude. I see what you’re doing.”

07:38 Jonathan – “Anthropic’s Claude: Now with crayons.”

08:50 Introducing Genie Code

Databricks has launched Genie Code as a generally available product, positioning it as an agentic AI system built specifically for data teams rather than general software development.
It handles end-to-end tasks, including pipeline building, dashboard creation, ML model training, and production monitoring, directly within Databricks notebooks, SQL editor, and Lakeflow Pipelines.
The system claims to outperform a leading coding agent by more than 2x on real-world data science tasks, with the key differentiator being deep Unity Catalog integration that gives it access to data lineage, usage patterns, governance policies, and business semantics rather than just reading raw code.
Genie Code routes tasks across multiple models automatically, selecting from frontier LLMs, open source models, or custom Databricks-hosted models depending on the job, removing the need for users to manually choose models for different tasks.
A notable upcoming capability is background agents, which will proactively monitor Lakeflow pipelines and AI models, triage failures, handle routine Databricks Runtime upgrades, and auto-fix issues like schema mismatches in a sandboxed environment before alerting the team.
The governance angle is worth discussing for enterprise cloud users: Genie Code enforces Unity Catalog access controls during all operations, meaning it only surfaces data assets a user is authorized to see and respects existing lineage rules when building pipelines, which addresses a common concern with agentic systems operating on sensitive production data.

10:05 Ryan – “I don’t think it will kill Glue or any of the ETL things, but hopefully it will just do it for you, and then I don’t think I care anymore.”

11:19 1M context is now generally available for Opus 4.6 and Sonnet 4.6

Anthropic has moved 1M context windows to general availability for Claude Opus 4.6 and Sonnet 4.6, with standard pricing applying across the full window and no long-context premium.
Opus 4.6 is priced at $5/$25 per million input/output tokens, and Sonnet 4.6 at $3/$15, meaning a 900K-token request costs the same per-token rate as a 9K one.
On the performance side, Opus 4.6 scores 78.3% on MRCR v2, a benchmark measuring recall and reasoning across long contexts, which Anthropic claims is the highest among frontier models at that context length.
Practical use cases include loading entire codebases, thousands of pages of contracts, or full agent traces with tool calls and intermediate reasoning, eliminating the need for lossy summarization or manual context management that long-context workflows previously required.
Claude Code users on Max, Team, and Enterprise plans now get 1M context automatically with Opus 4.6, meaning fewer session compactions and more conversation history retained without consuming extra usage credits.
The 1M context window is available natively on the Claude Platform and through Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry, making it accessible across the major cloud provider ecosystems that developers are already using.

19:46 Introducing GPT-5.4 mini and nano

OpenAI released GPT-5.4 mini and nano, two small models positioned for high-volume, latency-sensitive workloads.
GPT-5.4 mini runs more than 2x faster than GPT-5 mini while approaching GPT-5.4 performance on benchmarks like SWE-Bench Pro and OSWorld-Verified.
Pricing is notably lower than larger models: GPT-5.4 mini costs $0.75 per 1M input tokens and $4.50 per 1M output tokens, while GPT-5.4 nano comes in at $0.20 input and $1.25 output per 1M tokens, with a 400k context window on mini.
The models are designed for multi-model orchestration patterns where a larger model like GPT-5.4 handles planning and coordination while GPT-5.4 mini subagents execute narrower parallel tasks, a pattern OpenAI has built directly into their Codex product.
In Codex specifically, GPT-5.4 mini uses only 30% of the GPT-5.4 quota, giving developers a cost-effective path for simpler coding tasks like codebase navigation, targeted edits, and debugging loops without sacrificing too much capability.
GPT-5.4 nano is API-only and recommended for classification, data extraction, ranking, and simpler subagent tasks, making it a practical option for cloud workloads where cost and throughput matter more than deep reasoning.

21:00 Ryan – “I’m a fan of these little models for certain things; as part of that tuning, my agent definitions have gotten a lot more complex. A lot of times, I’m breaking out agent definitions so that I can specifically use one of the smaller models for certain types of tasks. Data extraction being a big one.”

AWS

22:53 Twenty years of Amazon S3 and building what’s next

S3 turns 20 years old this month, growing from 1 petabyte of capacity and 15 cents per gigabyte in 2006 to hundreds of exabytes storing over 500 trillion objects at just over 2 cents per gigabyte today, representing roughly an 85% price reduction over two decades.
A notable engineering detail is that code written for S3 in 2006 still works today unchanged, with AWS maintaining complete API backward compatibility through multiple infrastructure generations, which is why the S3 API has become a de facto standard across the storage industry.
On the technical side, AWS has spent 8 years progressively rewriting performance-critical S3 components in Rust for memory safety and performance, and uses formal methods with automated proofs to mathematically verify consistency in the index subsystem and cross-region replication.
AWS is positioning S3 as a universal data foundation with three newer capabilities worth noting: S3 Tables for managed Apache Iceberg analytics, S3 Vectors for native vector storage supporting up to 2 billion vectors per index at sub-100ms latency, and S3 Metadata for centralized object cataloging, all priced at standard S3 cost structures rather than specialized database pricing.
The maximum object size has grown from 5 GB to 50 TB, and AWS reports customers have collectively saved over $6 billion in storage costs through S3 Intelligent-Tiering compared to S3 Standard storage class pricing.

24:08 Justin – “I am a big fan of the S3 vectors because we use it for Bolt.”

25:39 Introducing account regional namespaces for Amazon S3 general-purpose buckets

AWS S3 now supports account regional namespaces for general-purpose buckets, where bucket names automatically include your account ID and region as a suffix, such as mybucket-123456789012-us-east-1-an.
This solves the long-standing problem of bucket name collisions in the global namespace, particularly useful for large organizations managing buckets at scale across multiple regions.
The feature integrates with IAM and AWS Organizations service control policies via the new s3:x-amz-bucket-namespace condition key, allowing security teams to enforce that employees only create buckets within their account’s namespace.
- This gives enterprises a straightforward governance mechanism to prevent naming conflicts and unauthorized bucket creation.
Existing global namespace buckets cannot be renamed to use the account regional namespace, so this is a forward-looking change for new bucket creation only. S3 table buckets, vector buckets, and directory buckets already operate in account-level or zonal namespaces, so this update brings general-purpose buckets in line with those patterns.
CloudFormation support is included via the BucketNamespace property and pseudo parameters AWS::AccountId and AWS::Region, making it straightforward to update existing IaC templates. CLI and Boto3 support is also available using the x-amz-bucket-namespace header or BucketNamespace parameter.
The feature is available across 37 AWS regions, including AWS China and GovCloud, at no additional cost, making it a low-friction adoption for teams looking to simplify bucket naming conventions without budget impact.

27:17 Jonathan – “What’s really annoying is your account number is part of the public S3 bucket name! I wish a security person had been in the room there.”

28:17 Amazon CloudWatch Application Signals adds new SLO capabilities

Amazon CloudWatch Application Signals now includes three new SLO capabilities: SLO Recommendations, Service-Level SLOs, and SLO Performance Report, addressing longstanding gaps in data-driven reliability management for AWS customers.
SLO Recommendations analyzes 30 days of historical P99 latency and error rate data to suggest appropriate reliability targets, reducing the manual guesswork that previously led to misconfigured thresholds and alert fatigue.
Service-Level SLOs give teams a consolidated view of reliability across all operations within a service, making it easier to align technical monitoring with business objectives without stitching together multiple dashboards.
The SLO Performance Report adds calendar-aligned historical reporting at daily, weekly, and monthly intervals, which is useful for teams that need to present reliability data to stakeholders in business-friendly formats.
Pricing is usage-based, tied to inbound and outbound application requests plus SLO charges, with each SLO generating 2 application signals per service level indicator metric period. The features are available in all regions where CloudWatch Application Signals is supported.

29:11 Jonathan – “So instead of fixing your product, you just use a tool that tells you that you should turn down your commitments to your customers. Ok…”

29:57 Amazon SimpleDB now supports exporting domain data to Amazon S3

Amazon SimpleDB, one of AWS’s oldest database services dating back to 2007, now supports exporting domain data directly to S3 in JSON format, giving long-time users a practical path to migrate away from the service or archive data for compliance purposes.
The export tool introduces three new APIs (StartDomainExport, GetExport, and ListExports) with background processing that avoids any performance impact on the running database, which matters for users who cannot afford downtime during data extraction.
Cross-region and cross-account support, along with multiple encryption options, make this useful for organizations with strict data governance requirements who need to move SimpleDB data into modern storage or database systems.
Rate limiting is set at 5 exports per domain and 25 per account within a 24-hour window, so teams with large numbers of domains should plan their migration timelines accordingly rather than assuming bulk exports can happen all at once.
The tool itself is free to use, but standard S3 data transfer charges apply, so cost planning should account for data volume when scoping a migration or archival project.

30:53 Justin – “SimpleDB gets a new feature!”

32:19 Amazon CloudWatch introduces organization-wide EC2 detailed monitoring enablement

CloudWatch now supports organization-wide rules to automatically enable EC2 detailed monitoring, shifting metrics collection from a per-instance manual task to a centralized policy-driven configuration across the entire AWS Organizations.
Rules can be scoped to the full organization, specific accounts, or individual resources using tags, so teams can target environments like production workloads without enabling the feature universally and incurring unnecessary costs.
The 1-minute interval metrics that detailed monitoring provides are particularly relevant for Auto Scaling groups, where faster data collection means scaling policies can respond more quickly to utilization changes rather than waiting for the default 5-minute interval.
The feature covers both existing and newly launched instances within the rule scope, which closes a common gap where new instances spun up after policy creation would otherwise miss monitoring configuration.
Detailed monitoring costs apply per instance per metric per month per CloudWatch pricing, so organizations should evaluate tag-based scoping carefully to avoid unexpected billing increases when rolling this out broadly.

33:17 Ryan – “I mean, what’s wrong with the previous method of waiting until you had an outage, not having the data, and THEN turning it on for your project?”

GCP

33:47 Why context is the missing link in AI data security

Google Cloud’s Sensitive Data Protection is now generally available with new context classifiers for medical and finance data, plus image object detectors for faces and passports, moving beyond simple keyword matching to understand the semantic meaning of data.
For AI training workflows on Vertex AI, SDP can scan unstructured image data using OCR and object detection to find sensitive content like credit card numbers or photo IDs, then generate redacted versions rather than discarding the data entirely, preserving training dataset quality.
The context-aware approach addresses a practical problem with traditional regex-based detection: the same number sequence can be treated differently depending on surrounding words, so “order number” passes through while “wallet number” triggers financial context classification and redaction.
SDP serves as the underlying engine for several other Google Cloud products, including Model Armor, Security Command Center, and Contact Center as a Service, meaning improvements here propagate across those services automatically.
Organizations in regulated industries like healthcare and finance are the most direct beneficiaries, as the tool helps ensure AI agents only access data appropriate to their function during both training and live user interactions. Pricing details are not specified in the announcement, so teams should check cloud.google.com/security/products/sensitive-data-protection for current rates.

35:16 Ryan – “I don’t really think that’s usually where the sensitive data is. It can be, in some workloads, but probably not the majority, so there’s so many false positives, so I really like the idea that they’re having context be a part of that decision.”

37:16 Welcoming Wiz to Google Cloud: Redefining security for the AI era

Google has completed its acquisition of Wiz, a cloud and AI security platform, which will retain its brand and continue supporting multicloud environments, including AWS, Azure, and Oracle Cloud Platform, alongside Google Cloud.
Wiz connects code, cloud, and runtime into a single context, allowing security teams to map application architecture, permissions, data flows, and runtime behavior in real time to identify and prioritize exploitable attack paths before they reach production.
The combined offering integrates Wiz’s cloud security platform with Google Security Operations, Mandiant Consulting, and Google Threat Intelligence under the Google Unified Security umbrella, with Gemini AI assisting in threat hunting, remediation workflows, and audit documentation.
A notable focus of the acquisition is AI-specific security, addressing threats that target AI models and those generated by AI systems, which is increasingly relevant as organizations deploy AI agents fed with business-critical data.
Pricing details for the combined platform have not been announced, but Wiz products will remain available through existing partner channels, system integrators, and managed security service providers, suggesting continuity for current Wiz customers during the transition.

38:16 Justin – “Typically on these acquisitions, it takes about a year for Google to figure out how to package them properly, and most likely they’ll want a separate contract for it anyways because that’s how all the integration acquisitions they’ve done are.”

39:22 IAP integration with Cloud Run

Google Cloud Run now supports direct Identity-Aware Proxy integration in general availability, allowing developers to enable IAP authentication with a single UI click or the –iap flag in gcloud, eliminating the previous requirement to configure load balancers manually. IAP carries no additional cost beyond standard Cloud Run charges, with limited exceptions noted in the pricing docs.
IAP on Cloud Run supports enterprise authentication features, including user and group identity policies, context-aware access controls based on IP, geolocation, and device status, and Workforce Identity Federation for external identity providers. This makes it practical for organizations that need to secure internal web applications without building custom authentication layers.
A separate change allows Cloud Run services to disable the default IAM invoker check by selecting “Allow Public access,” which resolves a long-standing friction point for teams trying to host public-facing applications while also enforcing Domain Restricted Sharing org policies.
The two features address different scenarios: IAP is the recommended path for internal business applications requiring user authentication, while the public access option suits public websites, store locators, or private microservices where network-level controls like Cloud Armor handle security instead.
Real-world adoption examples include L’Oreal using IAP across their Google Cloud application portfolio and Bilt Rewards disabling IAM invoker checks on multi-regional Cloud Run services to simplify edge routing while relying on Cloud Armor for security enforcement.

39:57 Ryan – This is a neat little feature. I don’t know how widely known it is, but it’s something that I’ve been using for a while.”

42:09 Multi-cluster GKE Inference Gateway helps scale AI workloads

Google Cloud has launched a preview of multi-cluster GKE Inference Gateway, which extends the existing GKE Gateway API to enable model-aware load balancing for AI inference workloads across multiple GKE clusters and regions.
This addresses practical limitations of single-cluster deployments like GPU/TPU capacity caps and regional availability risks.
The system introduces two core Kubernetes custom resources, InferencePool and InferenceObjective, which group model-server backends and define routing priorities, respectively.
This allows the gateway to intelligently multiplex latency-sensitive and lower-priority inference requests across a distributed fleet.
A notable technical capability is the GCPBackendPolicy resource, which enables load balancing decisions based on real-time custom metrics such as KV cache utilization on model servers.
This is more inference-specific than traditional request-count or latency-based routing approaches.
The architecture uses a dedicated config cluster to manage a single Gateway configuration that routes traffic to multiple target clusters, simplifying operations for teams running globally distributed AI services. Supported use cases include disaster recovery, capacity bursting, and heterogeneous hardware utilization.
Pricing for this feature is not separately detailed in the announcement, so costs would likely follow existing GKE and Cloud Load Balancing pricing structures. Teams evaluating this should factor in multi-cluster networking and potential cross-region data transfer costs alongside their GPU/TPU resource expenses.

43:06 Ryan – “Simplify. Sure…”

44:35 More transparency and control over Gemini API costs

Google AI Studio now supports Project Spend Caps, letting developers set monthly dollar limits per project directly from the Spend tab.
There is a roughly 10-minute enforcement delay, so users remain responsible for any overages incurred during that window.
Usage Tiers have been redesigned with lower spend qualifications, automatic tier upgrades based on payment history, and system-defined billing account caps that increase as you move to higher tiers. This reduces manual intervention for developers scaling their API usage over time.
Three new dashboards have been added to Google AI Studio covering rate limits, costs, and usage. The rate limit dashboard tracks RPM, TPM, and RPD per project, while the cost dashboard offers a daily breakdown filterable by model and time range going back up to a full month.
Billing setup can now be completed entirely within Google AI Studio, including linking billing profiles to projects, removing the previous need to navigate across multiple Google Cloud console windows.
This consolidation is particularly useful for teams managing several projects under one billing account.
Developers building with Imagen and Veo now have dedicated usage graphs alongside standard request metrics, giving multimodal workloads the same observability previously available only for text-based Gemini API calls.

45:13 Justin – “If you’ve ever tried to figure out who is using what models and what they’re doing with them and how much it costs, you know that this is all terrible – and this doesn’t actually improve it all that much.”

Azure

47:35 Generally Available: Azure SRE Agent with new capabilities

Azure SRE Agent is now generally available as an AI-powered operations tool designed to help teams diagnose incidents faster and automate response workflows, to reduce downtime and manual operational work.
The GA release introduces deep context gathering capabilities, meaning the agent can pull together relevant signals and telemetry during an incident rather than requiring engineers to manually correlate data across multiple tools.
This fits naturally into teams already using Azure Monitor, Application Insights, and related observability tooling, as the agent is positioned to work within existing Azure operations workflows rather than requiring a separate platform.
The primary target audience is operations and SRE teams managing production workloads on Azure who are looking to reduce the time between incident detection and resolution without adding headcount.
Pricing details were not included in the announcement, so teams evaluating this should check the Azure pricing page directly here before planning adoption, as AI-powered agent services on Azure typically carry consumption-based costs.

48:25 Jonathan – “All right, so they run the services, which are going to have problems. And now they want me to pay for another service so that I can use that tool to troubleshoot the problems with the other tools that I’m already paying for. OK…”

55:59 Many agents, one team: Scaling modernization on Azure

Azure announced two new public preview offerings: the Azure Copilot migration agent and the GitHub Copilot modernization agent, designed to automate discovery, assessment, planning, and deployment for organizations moving workloads to Azure.
The migration agent targets servers, virtual machines, applications, and databases, while the modernization agent orchestrates code upgrades at scale across multiple applications simultaneously.
The two agents are designed to work together, with GitHub Copilot scanning application code to produce assessment reports that Azure Copilot’s migration agent then ingests to inform cloud infrastructure planning. This integration aims to close the historical gap between developer-level code work and infrastructure decisions around landing zones, networking, and governance.
Early customer results show a 70% reduction in total modernization effort using GitHub Copilot modernization capabilities, and Ahold Delhaize is cited as a customer that reduced complexity and accelerated delivery using these agentic workflows across discovery, assessment, and execution.
Microsoft is pairing these agentic tools with a structured delivery program called Cloud Accelerate Factory, a no-cost benefit under Azure Accelerate where Microsoft experts work alongside customers from discovery through production. Pricing for the agents themselves is not specified in the announcement, so listeners should check Azure pricing pages directly for cost details.
According to a Forrester Q1 2026 survey of 223 global IT leaders, 91% view application modernization as necessary for enabling AI in their business, which provides context for why Microsoft is investing in automating what has traditionally been a slow, manual planning process.

52:32 Ryan – “I keep waiting for someone to tout the success of how they did it, they’ve migrated all their terrible legacy code into this new thing, and it all works – but I haven’t seen it…”

53:28 Announcing Fireworks AI on Microsoft Foundry

Microsoft Foundry now integrates with Fireworks AI’s inference cloud, giving customers access to models like DeepSeek v3.2, Kimi K2.5, and OpenAI’s gpt-oss-120b through both pay-per-token and provisioned throughput deployment options.
This is currently in public preview and requires an opt-in through the Azure portal’s Preview features panel.
Pricing follows a per-million-token model for serverless deployments covering input, cached input, and output tokens, with US Data Zone availability across six regions, including East US and West US.
Default quota limits start at either 250K or 25K tokens per minute, depending on subscription type, with additional quota available via a request form.
A notable addition is custom model support, allowing teams who have fine-tuned models from families like Qwen3-14B, DeepSeek v3, or Kimi K2 to import and deploy those weights directly into Foundry projects.
The Azure Developer CLI has been updated with an azd ai models create command to facilitate the weight transfer process.
Fireworks-hosted models are distinct from Azure Direct models in that they skip Microsoft’s Responsible AI safety assessments, so teams needing safety evaluations will need to use Foundry’s built-in risk and safety evaluator tools separately.
Model retirement for serverless deployments comes with at least 30 days’ notice, and customers can extend usage past retirement dates by switching to provisioned throughput deployments, which use existing Global PTU quota and reservation commitments.

54:30 Justin – “Sounds like it’s a cross-connect that they’ve done to Firework’s cloud basically, to provide this to you, so it’s sort of interesting.”

56:02 Announcing Copilot leadership update

Microsoft is reorganizing its Copilot efforts by merging consumer and commercial teams into a single unified org, structured around four pillars: Copilot experience, Copilot platform, Microsoft 365 apps, and AI models.
Jacob Andreou will lead the combined Copilot experience as EVP, reporting directly to Satya Nadella.
Mustafa Suleyman is shifting focus exclusively to what Microsoft calls its “superintelligence” effort, concentrating on frontier model development, enterprise-tuned model lineages, and reducing inference costs at scale over the next five years.
The restructuring reflects a product direction where Copilot moves from individual features toward an integrated system connecting agents, apps, and workflows, with recent announcements like Copilot Tasks, Copilot Cowork, and Agent 365 representing early examples of this approach.
For enterprise customers, the key practical implication is that commercial and consumer Copilot capabilities will converge, meaning IT and governance controls will need to account for a more unified product surface rather than separate consumer and business tracks.
The Copilot Leadership Team now includes Suleyman, Andreou, Charles Lamanna, Perry Clarke, and Ryan Roslansky, signaling that Microsoft 365 app development and platform infrastructure will be tightly coordinated with model development rather than operating independently.

57:23 Ryan – “Noticeably missing is Github’s Copilot…”

After Show

55:59 Washington state hotline callers hear AI voice with Spanish accent

Washington state’s Department of Licensing accidentally routed Spanish-language callers to an AI voice speaking English with a Spanish accent for several months, a direct result of a misconfiguration by DOL staff using Amazon Web Services Polly.
AP journalists were able to replicate the issue by selecting the AWS Polly voice named “Lucia,” which is designed to mimic Castilian Spanish, highlighting how easy it is to misconfigure AI voice services when teams lack familiarity with the underlying platform options.
The incident is a practical reminder that deploying AI-driven customer service tools across multiple languages requires thorough testing and quality assurance, particularly for government agencies serving diverse populations with real accessibility needs.
Amazon provided the platform but declined interview requests, raising a recurring question in cloud deployments about where vendor responsibility ends and customer configuration responsibility begins when things go wrong in production.
The story went viral with around 2 million TikTok views, which illustrates how public-facing AI failures in government services can quickly become reputational issues, adding pressure on agencies to treat AI deployment with the same rigor as other critical infrastructure.

Closing

And that is the week in the cloud! Visit our website, the home of the Cloud Pod, where you can join our newsletter, Slack team, send feedback, or ask questions at theCloudPod.net or tweet at us with the hashtag #theCloudPod

[00:00:00] Speaker A: Foreign. To the cloud pod, where the forecast is always cloudy, we talk weekly about all things aws, GCP and Azure. [00:00:14] Speaker B: We are your hosts, Justin, Jonathan, Ryan and Matthew. [00:00:18] Speaker C: Episode 347, recorded from March 17, 2026. The Cloud Pod is only recording this week because of AI. Good evening, Jonathan and Ryan. How you guys doing? [00:00:30] Speaker B: Pretty good for an AI robot, so. Yeah. [00:00:33] Speaker A: Yeah. Good week so far. It is only Tuesday, though. [00:00:37] Speaker C: That's true. It's only the second day. And I mean, Jonathan has yet to replace us with AI. I do expect it to happen any day, but it hasn't happened yet, so [00:00:45] Speaker B: would be easy to replace too distracted [00:00:47] Speaker C: with other fun stuff. Yeah, that's. Considering your absence rate, you might consider it. [00:00:51] Speaker A: Yeah. [00:00:54] Speaker C: All right, well, we have a bunch of news to get through today, so we should probably jump right on into it. Otherwise we'll be here for 17 hours. No one, no one wants to listen to a podcast 17 hours long. [00:01:04] Speaker B: Unless you're. [00:01:07] Speaker C: Oh, God. Who. Who. [00:01:08] Speaker B: Whatever. [00:01:09] Speaker C: Moving on. Microsoft has filed an amicus brief in Anthropic's lawsuit against the US Department of War, urging a federal judge temporarily block the Pentagon's designation of Anthropic as a supply chain risk, citing substantial costs, government contractors that rely on anthropic models. We briefly talked about this last week when we covered the war in Iran and Claude and them being designated this way. And we did mention that Microsoft had filed that morning, which was sort of interesting that they were following behind then the next day or the day after they launched Copilot Cowork, which we did talk about yesterday, which of course is built on top of anthropic's Claude. And four months after, they, of course committed up to 5 billion in anthropic as part of a deal requiring Anthropic to spend at least 30 billion on Azure. So, oh, yeah, there's a vested interest in that lawsuit, which we did not mention this week. So I wanted to follow up on that because that explains very clearly now why Microsoft is throwing in with Anthropic on this one. Yeah, I wonder if it's. [00:02:02] Speaker B: I mean, there's clearly a bias, but I wonder if it's also just sort of like sort of against the sort of idea of unilaterally sort of calling something a, you know, a risk without really a whole lot of, you know, definition or merit behind it. So I can see how Microsoft, you know, they might be in the crosshairs one day. [00:02:24] Speaker A: Yeah, I think it's. It's the principle, it's the money. But it's the principle and I appreciate them joining in and I think other people have done the same thing as well. I think Google and OpenAI have also put their two cents in. [00:02:38] Speaker C: Yeah and they definitely, I don't know if they put an amicus brief in on in favor, but they definitely have weighed in as well in the public domain in general news at Lesian, the cursed maker of Jira and Confluence, which haunts my everyday being, is currently cutting roughly 1600 employees and about 10% of his workforce is hiding AI driven changes to required skill sets and a need to self fund further AI enterprise sales investments. And so they have now joined a long, well, a shortly a short list now of companies who are now blaming AI for all their problems and using it to justify cutting staff. We saw it with Block a few weeks ago, we saw it with other companies as well and there's even rumors this week that Meta is going to be doing a similar layoff as well as Amazon has plans for additional layoffs in Q2 all related to AI enhancements. Now I've seen Rovo, which is Atlasian's AI suite and if that's the best they can do, I, I have lots of fears for the long term health and viability of Jira in general. But I, I'm kind of over the whole let's blame AI for our bad business decisions. It's kind of a, yeah, it's going to get old real quick. Yeah. [00:03:42] Speaker B: And I think, you know, while it's, it's definitely they've you know, over hired and there's a lot of issues when you can't sort of supply or when you can't sort of demonstrate growth in this financial economy. But it's also sort of, there is a little bit of truth behind this where I do think there's executives who think that it can just replace like in a three to one ratio developers with AI and it's just not true. [00:04:07] Speaker A: I think the sales funnels, it may be maybe different though. I mean it's, it's super annoying to get emails which are clearly written by AI. You know we get people reaching out to podcasts all the time and it's exactly the same pattern every time. It's oh you know, I've got this guest. I always, I, I recently listened to episode X of, of the podcast and I really appreciate it. XYZ by the way, I think I got this person, I don't know, it's just, it's very, it's very cookie cutter. It's very, I kind of prefer it [00:04:32] Speaker C: though because the old emails were just, were just like random, like hey, you have a podcast, you sound cool. At least, at least like there was effort. You at least held the prompt like hey, make it personable. [00:04:43] Speaker B: And so at least with context of [00:04:46] Speaker C: no, it's still a lie. But I still, I still prefer it over the, the complete blind email outreach that we generally get. So I mean I, I, I agree with you that it is knowing that it is AI and I know it's AI, but I do appreciate it's at least 3% better than I preferred the [00:05:02] Speaker B: emails pre AI feel less bad about [00:05:04] Speaker A: just not replying to people, I guess. I don't know. But, but I think as far as sales, sales funnels and things go, I think AI is going to do pretty well at that. It can, can follow people on LinkedIn, on social media, it can send emails, it doesn't cost anything. [00:05:17] Speaker B: I mean we had how many Twitter bots before AI? So yeah, yeah, it's going to be a further proliferation of that. [00:05:26] Speaker C: There was a great comic I saw, [00:05:28] Speaker B: I'll have to see if I can [00:05:28] Speaker C: find it to get in the show notes, but it was, I need to write a report for my boss and So I have 10 bullet points and it's like turn 10 bullet points into 10 pages and then you see the boss get to go ten pages. I don't have time for that. AI, summarize it for me. Yeah, summarize this in a.10 bullet points. [00:05:45] Speaker B: Yeah, that's exactly right. [00:05:48] Speaker A: Yeah, I think that's, that says something important though. It just, it says that this sort of rituals that we have around thinking that things need to be done a certain way are just that they're not, they don't add any value. [00:06:01] Speaker C: Yeah, I mean, how many tokens did you waste on building that out? And it's so silly. But yeah, that was a good one. I'll define that much more the waste [00:06:08] Speaker B: beforehand when someone actually drafted the 10 page report and they only needed to do 10 bullet points. Like that's so much more wasteful in my opinion. [00:06:19] Speaker C: Moving on to AI is how ML makes money. This week, Claude Anthropic has launched in beta a new inline visualization feature for Claude that generates interactive charts, diagrams and other visuals directly within chat conversations, available across all plan tiers at no additional cost. These visuals are distinct from Claude's existing artifacts in an over way. They are temporary and contextual, appearing inline rather than in a side panel, and they update or disappear as a conversation evolves rather than serving as a Persistent shareable document. Cloud determines autonomously when a visual would aid comprehension, but users can also prompt it directly with natural language requests like draw this as a diagram or visualize how this might change over time. A feature is part of a broader set of response format improvements Anthropic has been rolling out, including purpose built layouts for recipes and weather queries as well as direct and conversation integrations of third party tools like figma, canva and Slack. And I have seen this and I do appreciate it. It is nice when you just wanna do something quick and dirty, but I wish there was a way to always make it to force it to do the side panel way as well. Cause those are artifacts that stick around and I typically like both to save them off when I do something interesting with the data. So appreciate it but also be careful with it because they are ephemeral. [00:07:25] Speaker B: Kind of excited when Claude decides that, you know, the monkey making the queries needs, needs bigger pictures because the text isn't working out. So it's like get you cloud. I see what you're doing. [00:07:36] Speaker A: Another show photo, then Anthropis Claude now with crayons. [00:07:43] Speaker C: That would have been great. Show title. That would have been great. [00:07:45] Speaker A: Yeah, I mean they're obviously been, they've been behind I think in the UI space for a while. I think, I think OpenAI's canvas was really early and really quite good. I think they're still lagging a little bit in that kind of interactive in the web page, manipulating things together. But now of course they got Cowork in the diagram so I don't know. [00:08:08] Speaker C: Yeah, I mean I Cowork definitely does a lot of it. And the coworker has to live inside of a directory space too. So it always creates all its artifacts inside of whatever directory you assign it to, which is pluses and minuses. It's nice if you want to really segment your work. It's not so great if you're just trying to do quick things. But I do appreciate the, the segmentation. [00:08:27] Speaker A: Oh, we need to bring back like the ActiveX directories from Windows. Windows 98. [00:08:31] Speaker C: Oh my God. Let's not leave ActiveX in the past where it belongs. Yeah. Before Garland gets more smart ideas, let's move on to Databricks, which has launched GENIE code as a general available product, positioning it as a gentic AI system built specifically for data teams rather than general software development. It handles end to end tasks including pipeline building, dashboard creation, ML model training and production monitoring directly within databricks, notebooks, SQL editor and the lakeflow pipeline system claims to outperform a leading coding agent by more than 2x on real world data science tasks, with the key to sure being deep unity catalog integration that gives it access to data lineage usage patterns, governance policies and business semantics. Rather than just reading raw code, GENIE code routes tasks across multiple models, automatically selecting from frontier LLMs, open source models or custom databricks hosted models depending on the job, removing the need for the user to manually choose models for different tasks. A notable upcoming capability is background Agents which will proactively monitor lake flow pipelines and AI models, triage failures, handle routine databricks, runtime upgrades and auto fix issues like schema mismatch mismatches in sandbox environments before alerting the team governance angles. Worth discussing for enterprise cloud users with GENIE code enforcing unity catalog access controls during all operations, meaning only surfaces data assets a user is authorized to see and respects existing lineage rules when building pipelines in general, I mean, if this kills glue or any other thousands of ETL tools, I'm all for this genie. [00:10:00] Speaker B: Yeah, well, I mean, I don't think it'll kill glue or any of the ETL things, but hopefully it'll just do it for you and then, then I don't think I care anymore. [00:10:08] Speaker C: Yeah, I mean, if it's, it's orchestrating glue so I don't have to write the glue code, then fine. Okay. Accepted. Yeah, I mean, I don't know, like, [00:10:18] Speaker B: not a data scientist, so I don't know the ins and outs and I imagine this is slightly better than, you know, into, you know, sort of lining all these things together to, in order to get the data where you need it to go, which is pretty good. Like I, I think I can, I can see how like general coding, AI assistance, or even the general models in your ID can sort of probably not fill in all the gaps. So this is probably very specific and so it's pretty cool. [00:10:46] Speaker A: Yeah, I mean, I guess you could always have AI write dags for you and things like that, but I guess the, the flexibility is that if it does spot something that doesn't fit, it can at least apply some intelligence to fixing something that you didn't plan ahead for. Maybe. We'll see. [00:11:02] Speaker C: Yeah, I mean, if it's perfect, then what do you need data engineers for? And then now we killed a whole other profession of people. So. [00:11:09] Speaker B: Though [00:11:12] Speaker C: Anthropic announced previously the 1 million context window for both Opus 4.6 and Sonnet 4.6, and because it was in preview, it also cost a small fortune. So they basically had a premium price once you crossed over the 1 million token threshold or the 300,000 token threshold. But now as they have now generally available the 1 million token, they are now using standard pricing, applying across the full window and no longer providing the context premium. This reminder for you, Opus 4.6 is priced at $5 per input token or billion input tokens and $25 per million output tokens. And Sonnet 406 is 3 and 15, meaning 900,000 token requests cost the same per token rate as 9,000 token or 9,000 kilobyte 1. On the performance side, Opus 406 scores 78.3 on MRCRV 2. Eventual measuring recall and reasoning across long contexts which anthropic claims is the highest among frontier models at that context length, Practical use cases include loading entire code bases, thousands of pages of contracts or full agent traces with tools, tool calls and intermediate reasoning, eliminating the need for lossy summarization or manual context manager that long Context workflows previously had required. Cloud co users on Max Team and enterprise plans now get 1 million contacts automatically with Ovis 4.6, meaning fewer session compactions and more conversation history retained without consuming Extra users credits. 1 million context window is available natively on cloud platform and through Amazon Bedrock Cloud Cloud, Vertex AI and Microsoft Foundry, making it accessible across all the cloud providers for your use. [00:12:37] Speaker B: Yeah, I haven't been able to use this in my day to day and because of that I'm sort of limited and I really went to town on sort of what was in my context window and what was loaded automatically and really tried to tune that up because it was. It's just getting so loaded right? Like it's neat. You can define all these tools and sub agents and stuff, but all that gets loaded up into those context windows and it just takes up space and you know, I just find myself over it all. Every time they do a compaction window I just lose too many instructions and so I try to keep that down where I'll just start a whole new session right after that. You know, that's my key, my sort of trigger to like okay, this is as long as this little chat window is going to go, so I need to open a whole new one. [00:13:24] Speaker C: Yeah, you really need to get into agents. If you're, this is cloud code you're dealing with us in. [00:13:28] Speaker B: Well, I have agents but the agent, the agents themselves get loaded into context and so like yeah, you can if [00:13:34] Speaker C: you're using, if you're using the sub agent, it's a. It's not generally available feature yet, so you have to turn it on with like a environment flag. But basically once you do that, you, you basically maintain a master control interaction you have. And then you basically say you want you to use sub agents for these tasks and then basically each task gets its own portion of the context that it needs for its task as well as drives into their subagent. So that way each sub agent works in its own context window. Then you don't have that problem of either polluting the master context, which you don't want to do. But also you can kill sub agents pretty quickly if you need to, which is pretty handy. [00:14:08] Speaker B: Yeah. So I haven't been using that in cloud code, but I have been using it in Copilot and you know, for the. Exactly that reason. Right. Like having a documentation, you know, agent, go read this. Right. So it doesn't ingest all the documents into the main context or the, or the coding agent, but it's still something to be, you know, like that, that agent definition. The fact that it exists and when it needs to be used like that's, that's what I mean. [00:14:32] Speaker C: It's in the context one too. [00:14:33] Speaker A: Yeah. It's really interesting watching claw code with that the agent team stand on. I prefer swarms, but they didn't like swarms. They thought it was too matrix like so they called them teams. It's really interesting watching Claude using the teams because the, you know, the sort of the master process which is instructing the other agents to do things like I'm going to start these agents now and then, then it checks on it in a few minutes. It's like, actually I think I'll just start working on this myself. The agent hasn't returned yet. [00:14:58] Speaker B: I'm like, okay, that's funny. [00:15:00] Speaker A: It's, it's, it's really weird. It's been some interesting messages lately. I've seen Claude kind of like getting a little bit frustrated with, with the system it's working in. [00:15:08] Speaker C: Yeah. [00:15:08] Speaker A: Like calls, calls a tool or something to make a change of file and it gets rejected and it's like. But I already read that file above. [00:15:17] Speaker B: Yeah, it's a little sniffy. [00:15:19] Speaker C: It's funny. [00:15:20] Speaker B: Yeah, it is. I, I had it yelling at a client library today, like, oh, this is poorly documented. And it was just the, just the tone of it. And it went and looked at a different client library. I said, oh, they have it documented in this one, you know. [00:15:33] Speaker A: Yeah. [00:15:33] Speaker B: I don't know how much I'm the [00:15:37] Speaker C: response, but it's Pretty funny. [00:15:39] Speaker A: Yeah, it's funny when it says it's. It's like, it's like having a contract around, you know, who's judging the work of the last guy. He was here all the time. It's like, oh, I don't know who, don't know who did this work for you? [00:15:49] Speaker C: Yeah. [00:15:49] Speaker A: Sucks, inhales through teeth kind of thing. Like actually Claude, you wrote that yesterday. [00:15:55] Speaker C: Yeah. [00:15:57] Speaker A: Anyway, yeah, like the, the millennium context. I haven't tested out on CLAUDE yet. I did test it out on GPT5 something and it actually wasn't very good at all. I think after you get about halfway through the context, it really trails off for, for D detailed recall. So I'm curious to see how. Well, I don't think it was the same benchmark. I don't think it was that, that same. What was, what was it called? Mrcr. I don't think it's the same benchmark I was looking at. But it, you know, by the time we got to 6, 700,000 tokens in the context on the OpenAI model, it was like half the performance, half the score at the benchmark. So I'm curious to see what they've done with Claude. I feel like they tend to go slower and release better quality stuff. So. [00:16:40] Speaker C: Yeah, I mean the big thing with a big context is the context drift, like you said. So I, I haven't seen any data yet, but I'd heard initial beta that people were having similar complaints that they were having with gemini with the 1 million context window. So it's, it's nice if you need to look at something, but it's not great to work in actively as my typical experience with 1 million context window. [00:16:59] Speaker A: Yeah. [00:17:00] Speaker B: Is this something you select per, like query or chat? [00:17:04] Speaker C: Well, like if you're using, if you're using like a third party tool. Like I used to switch between Gemini and Anthropic all the time when I was using Klein and those tools. And so, you know, oh, I hit the context window on, on claw. But I was right in the middle of something. I went on to finish before I recompact. So then I would just pivot over to Gemini and be like, hey, finish what we were working on and it'll just wrap it up in a nice way and then I can then compact and come back and then go back to Gemini or to a Claude like I wanted to be. So that was where I would typically use it. But yeah, yeah, I don't know. Okay. [00:17:31] Speaker A: Yeah, I think the compact's got a lot better With Claud code, at least the auto compact has got better. I've let it run through and do auto compaction or sometimes it does it in the middle of something it's working on, which is always slightly concerning. [00:17:43] Speaker C: But anyway, it does that. [00:17:45] Speaker A: Yeah, it's, it's been, it's been pretty good. There's been a couple of things where it, where it dropped a couple of to dos from a list and then it. I had to kind of put them back in manually but in general it's a lot better. But if you give it, if you notice the little thing in the, the corner, it tells you, you know, 2% left to auto compact. It's best just to give it some explicit instructions. [00:18:04] Speaker B: Yeah, I agree, it is getting better. It's still, still a little issue. [00:18:08] Speaker C: Yeah, yeah, it's a, they added in the, the continuous compaction that was supposed to make things better. But you know, honestly I haven't really seen any benefit from that. I don't know if you guys have similar opinion, but is that, is that [00:18:22] Speaker A: that for the web, for the web version, for, for the chat version or for code? [00:18:26] Speaker C: I think it's for the chat version. I don't think it's for code yet, but I'm hoping it's going to come eventually to chat, but not yet. We're starting to clog code but even then I still see when I'm in a long chat or doing a research project and coworker in chat that it'll go into a context thing. [00:18:45] Speaker B: Yeah, I haven't, I don't typically have a lot that fills a context window like not really into chatting with the bot. [00:18:53] Speaker C: I have a lot, I have a long running chats of like some, some things I'm curious about. And so like every week I typically go and update, you know, data from the week into the, into the, the conversation. And so I've ran out of context a few times there, sure. But yeah, most of the time my, they're quick chats like, hey, I'm trying to figure out this thing and it's like, here's your answer. I'm like, cool, thanks. Yeah, dead to me. Yeah, exactly. But you know, like there are those ones that I do have that are somewhat long running that I, I haven't seen that. But hopefully that gets better over time. All right. OpenAI is releasing GPT 5.4 Mini and Nano, the two small models positioned for high volume, latency sensitive workloads. And GPT 5.4 mini runs on more than 2x faster than GPT 5 mini while approaching GPT 5.4 performance on benchmarks like SWE, BenchPro and Osworld Verified. Pricing is notably lower than the larger models with GPT 5.4 mini costing 75 cents per million input tokens and 450 per million output tokens. The GPT 4.5 nano coming in at 20 cents per input and 125 per output per million tokens with a 400k context window on the Mini. The models are designed for multimodal orchestration patterns where a larger model like GPT 5.4 handles planning and coordination while GPT 5.4 mini sub agents execute narrower parallel tasks, a pattern OpenAI has built directly into their Codex product. In Codex Specifically, the GPT 5.4 mini uses only 30% of the GPT 5.4 quota, giving developers a cost effective path for simpler coding tasks like like code based navigation, targeted edits and debugging loops without sacrificing too much capability. GPT 504 Nano is API only and recommended for classification, data extraction, ranking and simpler sub agent tasks. Making a practical option for cloud workloads or costs and throughput matter more than deep reasoning. [00:20:35] Speaker B: Yeah, I'm a fan of these little models for certain things and like as part of that tuning, you know, my agent definitions have gotten a lot more complex and you know, a lot of times I'm breaking out agent definitions so that I can specifically, you know, use one of the smaller models for certain types of tasks. Data extraction being a big one. It'd be kind of neat for ranking. I hadn't really thought about that. It's kind of cool. [00:21:01] Speaker A: Yeah, that was one of the first use cases like Bert I think was one of the first models specifically trained for things like that. I think this is a, hey, you're in our ecosystem. Let's provide the full suite of, of tools now. Whereas before I think it was prohibitively expensive to use their models, you still want to use them. It's small and local, it's almost free. [00:21:24] Speaker B: Almost. [00:21:24] Speaker A: Almost. [00:21:28] Speaker B: I mean comparatively to the amount of tokens I'm burning, relatively it's still free. [00:21:34] Speaker C: I need to get into more low end models for coding stuff because I think, I think I'm probably, I use Sonnet most of the time, but it's probably overkill for a lot of the coding that I'm doing and stuff. So I really wish there was a way when you set up the agents actually in cloud code, it'd be nice to be like hey, hey, I want to use Sonnet for my Master. And then the sub agents, they can use the, you know, smaller haiku models. That'd be. That'd be awesome. But there's not an easy way to do that right now. [00:21:58] Speaker A: Yeah, you can. [00:21:59] Speaker B: Yeah, I'm out of date in the agent definition. Yeah. Because that's exactly what I'm doing is going back through and picking a model. [00:22:06] Speaker C: Oh, yeah, I guess you're right. In the agent definition. I can set it up there. I haven't thought about that. Yeah, good call. Good call. All right, Work to do. [00:22:13] Speaker B: We're all learning this crazy AI thing [00:22:16] Speaker C: as fast as we can. It's changing every day too. Is. I can't keep up sometimes. All right, AWS 20 years of Amazon S3 and building. What's next? S3 is 20. That means also that we celebrated the 15th birthday of S3 right at the beginning of the podcast. So that means we must be about five years old. So congratulations, guys. Five years. [00:22:37] Speaker A: Wow. [00:22:38] Speaker C: Growing from one petabyte of capacity and 15 cents per gigabyte in 2006 to hundreds of exabytes storing over 500 trillion objects. Just over 2 cents per gigabyte today, representing a roughly 85% price reduction over two decades. Noticeable engineering detail is that code written for S3 in 2006 still works today unchanged, with AWS maintaining complete API backward compatibility through multiple industry generations, which is why the S3 API has become a de facto standard across the storage industry. On the technical side, aws has spent eight years progressively rewriting Performance Critical S3 components in Rust for memory safety and performance, and uses formal methods with automated proofs to mathematically verify consistency in the index subsystem and cross region replication. Aws is positioning S3 is a universal data foundation with three newer capabilities worth noting. S3 tables for manage Apache Iceberg Analytics, S3 vectors for native vector storage supporting up to 2 billion vectors per index at sub 100 milliseconds latency and S3 metadata for centralized object cataloging. All prices standard S3 cost structures. And I am a big fan of the S3 vectors because we use it for bolts. That's. It's a way to look through all of our show archives. Over the last five years, the maximum object size has grown from 5 gigabytes to 50 terabytes, and AWS reports customers having collectively saved over $6 billion in storage costs through S3 Intelligent Tiering compared to S3 standard storage pricing. So, gentlemen, let's sing Happy birthday for S3. I'm just kidding. [00:23:57] Speaker B: Yeah, we're not going to sing [00:24:01] Speaker A: what stands out to me is that on average $0.02 a gig per month, one exabyte is $20 million a month of income and they're saying they have hundreds of exabytes. So that is a lot of money to bring it in from storage. [00:24:14] Speaker C: Oh yeah, it is a lot of money. [00:24:17] Speaker B: I mean what stands out to me is that if they've, I don't know, it doesn't say how much they've rewritten, but they've over the last eight years rewriting stuff in Rust, which is cool, but I don't remember like replacing some of those components. It's gotta be really tricky, right? They're core underlying things and I haven't seen any performance issues or reliability issues with S3. It's still a standard. That's pretty impressive. [00:24:40] Speaker C: It is really impressive. I mean it's the little bit they've given us into the how it works and all the different pieces of it. Like it's just a very impressive, technical, highly scaled piece of piece of of software and efficiency that they have that everything, everything is built on top of. All right, AWS S3 is giving us a future that has been wished for forever and that is Account regional namespaces for general purpose buckets where bucket names automatically include your account ID and region as a suffix, such as my bucket dash 123-45-6789012 dash us east one an this solves the long standing problem of bucket names collisions in the global namespace. Particularly useful for large organizations managing buckets at scale scale across multiple regions. The feature integrates with IAM and AWS organization service control policies via the new S3X AMZ bucket namespace condition key, allowing security teams to enforce that employees only create buckets with their accounts namespace. This gives enterprises a straightforward governance mechanism to prevent naming conflicts and unauthorized bucket creations. Existing global namespace buckets cannot be renamed to use their account regional namespace, so this is a forward looking change for new bucket creation only. And S3 table buckets, Vector buckets and directory buckets already operate in account level or zonal namespaces, so this update brings general purpose buckets in line with those patterns. Cloudformation support is included via the bucket namesake support property and pseudo parameters. I assume we'll be coming to Terraform very soon as well, and is available to you in 37 AWS regions including China and Govcloud at no additional cost. Ryan, I will not be migrating all of the cloud buckets to this. I do not care about your security requirements. I'm not doing it well. [00:26:16] Speaker B: Yeah, I mean I'm mostly upset because the title of the article was like, you know, these regional namespaces. But it's not really, it's just, it's just permissions name must equal this or permission denied. And I kind of, I was like, it's still a global namespace. [00:26:32] Speaker A: Yeah, well, what's really annoying about it is is your account number is part of the public S3 bucket name. I'm like, seriously? I wish a security person meeting the room there and said, hey, you know that's, that's just free information discovery for hackers. [00:26:46] Speaker B: Yeah. I'm sort of hoping that maybe you can tune it where you don't have to do the full region and account name. Maybe you can just enforce a prefix for suffix. [00:26:55] Speaker A: Yeah, just give me like 16 characters of random digits or something or letters or just anything that I don't know. [00:27:01] Speaker B: I know, I mean, I get it. They're so locked in, there's no way they could fix it now, right? With the global namespace. I'm certain of it. It would just break everything. The entire Internet would go down. So I, I sort of get it. But it is sort of funny to watch that struggle and it's just, it's a good lesson and you know, think about your architectures. But no one thought about 20 years later they were having to deal with this. [00:27:30] Speaker C: Amazon CloudWatch applications now include the three new Slo capabilities, Slo recommendations, service level Slos and Slo performance reports, addressing long standing gaps in data driven reliability management for AWS customers. SLO recommendations analyze 30 days of historical P99 latency and error rate data to suggest appropriate reliability targets, reducing the manual guesswork that previously led to misconfigured thresholds and alert fatigue. Service level SLOs give teams a consolidated view of reliability across all operations within a service, making it easier to align technical monitoring with business objectives without stitching together multiple dashboards. The SLO performance report adds calendar aligned historical reporting at daily, weekly and monthly intervals, which is useful for teams that need to present liability data to stakeholders in business friendly formats. Pricing is user based, tied to inbound about applications requests plus SLO charges with each SLO generating a two application signals per service level indicator. [00:28:21] Speaker A: So instead of fixing your product, you just use a tool that tells you you should turn down your your commitments to your customers. Okay, [00:28:28] Speaker B: well, I'm hoping like I, you know, I haven't used this, but I like the idea of service level SLOs being sort of amalgam of several performance indicators because it's, it is really annoying to get sort of a, a latency threshold alert and then realize that it's [00:28:42] Speaker C: for something that really doesn't matter. [00:28:45] Speaker B: And you, you know, but it's 3am and you're already awake. [00:28:48] Speaker A: It's a nice tool to give you an idea of where you stand, I guess, because otherwise it's we need, we need nine nines. I'm like, okay, yeah. [00:28:58] Speaker C: Amazon SimpleDB, one of AWS's oldest database services dating back all the way to 2007, now supports exporting domain data directly to S3 in JSON format, giving longtime users a practical path to migrate away from the service or archive data for compliance purposes. I'm surprised I didn't say please, please, please migrate away in the press release. [00:29:15] Speaker A: It's going to be coming, right? [00:29:17] Speaker C: The export tool introduces three new APIs, start domain export, Get Export and list exports with background processes that avoid any performance impact on the running databases, which matter for users who cannot afford downtime during data extraction. Cross region and Cross account support, along with multiple encryption options make this useful. Organizations with strict data governance requirements who need to move Simple DB data into modern storage or database systems, rate limiting is set to five exports per domain and 25 per account within a 24 hour window. The team's large number of domains should plan the migration timelines accordingly, rather than assuming bulk exports can happen all at once. The tool itself is free to use, but standard S3 data transfer charges will apply. Yeah, I mean SimpleDB gets a new feature that's, you know, crazy. [00:29:58] Speaker B: How long has it been since you like, they stopped allowing you to create new SimpleDB? It's been several, several years, so it's [00:30:05] Speaker C: definitely been a while. I don't exact dates on that, but I know every once in a while they'll drop a feature for it, which always cracks me up. And they're always about helping you get out of it. [00:30:13] Speaker B: Get out of it. Yeah. [00:30:16] Speaker A: There must be a customer that's using this and who can't change for whatever reason, but it's like an Iot thing or something else or just won't they [00:30:25] Speaker B: have a large data set. [00:30:27] Speaker A: They've killed a ton of other stuff though. Yeah, especially recently the last couple of years. [00:30:33] Speaker B: I imagine that that whatever customer or [00:30:35] Speaker C: set of customers is paying a lot [00:30:37] Speaker B: of money probably doesn't math out the same way as some of the other services they've killed. Like I don't think Codestar or whatever it was was making a whole lot of money. [00:30:49] Speaker C: I mean, I assumed it was some Amazon service that was like heavily embedded into SimpleDB and it was really them partially being the ones who didn't want to get rid of it, and then they had a few customers who were maybe also using it at some level. But I don't I never really understood why it's lasted as long as it has because I think it's been almost 10 years since they've allowed really new customers to even see it that exists in the console. So yeah, I bet you're right about [00:31:12] Speaker B: Amazon being a user there. That makes sense. [00:31:14] Speaker C: Yeah, Amazon CloudWatch is introducing organization wide EC2 detailed monitoring enablement. This allows you to enable detailed monitoring shifting metrics from a per instance manual task to a centralized policy driven configuration across your entire AWS organization. Organizations rules can be scoped to the full organization specific accounts or individual resources using tags so teams can target environments like production workloads without enabling the feature universally and incurring unnecessary costs. The 1 minute interval metrics that detailed monitoring provides are particularly relevant for auto scaling groups where faster data collection means scaling policies can respond more quickly to utilization changes rather than waiting for the default 5 minute interval. The feature covers both existing and newly launched instances with the rule scope, which closes a common gap where new instances spun up after policy creation would otherwise miss monitoring configurations. This is a very expensive feature if you're not aware, so someone really needed to goose their revenue and they were like hey let's turn it on globally so you will then make all of your people very sad. So be careful about that. [00:32:13] Speaker B: I mean what's wrong with the previous method of just waiting till you had an outage, not having the data and then turning it on for your project. [00:32:22] Speaker A: When you say stuff like that, it just reminds me that we really need like a Cards against humanity for cloud. [00:32:29] Speaker C: That's a great idea. [00:32:32] Speaker B: Coming soon to the Cloud POD store near you. [00:32:35] Speaker C: Yeah, all right, move to GCP Here they have an article about Google's Cloud's sensitive data protection is now generally available with new context classifiers for medical and finance data plus image object detectors for faces and passports, moving beyond simple keyword matching to understand the semantic meaning of data for AI training workflows on Vertex AI SDP can scan unstructured image data using OCR and object detection to find sensitive content like credit card numbers or photo IDs, then generate redacted versions rather than discarding the data entirely, preserving training dataset quality. The context aware approach addresses a practical problem with traditional regex based detection. The same number of sequence can be treated differently Depending on surrounding words. So order number passes through the wow. Wallet number triggers financial context, classification and redaction. STP serves as an underlying engine for several other Google Cloud products, including Model Armor Security Command center and Contact center. As a service, meaning improvements here propagate across those services automatically. Organizations in regulated industries like healthcare and finance are the most direct beneficiaries. So Ryan, security wise, do you like this? [00:33:40] Speaker B: Well, I like having context. Like this is an expensive feature. It's important to note. Right. I think if you have very large data sets, this can cost some money. But it is sort of tricky, right. When you're doing sense of data labeling or classification. You know, like the ability to have rules is important, but typically, you know, everyone accepts the default rules of like yeah, IDs, Social Security number, credit cards. And I don't really think that's usually where the sensitive data is. Right. Like it's, it can be in some, in some workloads, but probably not the majority. So there's so many false positives. So I really like the idea that they're, they're having context be part of that decision and then generating sort of redacted versions, which is cool. I've seen a demo of that which is kind of neat and it's just kind of, you know, like it's, you know, I saw the, in the demo they had a package and a person standing next to the package and so they, they redacted like the, the packing label on the face of the person so that it was, you know, considered private but still allowed sort of training on. I think they, the use case was like package handling or something like that. So, so it's kind of, it's kind of neat seeing stuff like that. [00:34:50] Speaker A: But hasn't Google Maps been doing that for years though, redacting license plates, people's, people's faces, stuff like that. Seems like it's a new old feature. [00:35:02] Speaker B: I think it's, you know, they've been doing it on their own data set, but it hasn't been. They have. I don't think they're applying that to other data sets. And this is the first time that you've been able to do that via service. Yeah, that is good. [00:35:16] Speaker A: I guess you could put it in line with any kind of web app or anything else. You know, I don't, don't accidentally leak data, please. I imagine it's rather expensive though, since they're using, you know, AI inference to look at every sentence that goes back and forth or image. [00:35:29] Speaker B: Yeah, I mean it can be definitely but it is, you know, an API that you can pass a payload to and put it in, put it in line. You know, performance and money aside. [00:35:40] Speaker A: No, that's, it's great though. We should have this. Years ago it was sending emails from mortgage companies and things like one last pass before you send an email just to make sure somebody didn't put some data in the wrong field that you think safe. And it turns out that it's not. Yeah, that's cool. [00:35:55] Speaker C: It is cool. Google has finally completed its acquisition of Wiz, a cloud and AI security platform, which has. I feel like this acquisition has been waiting forever, which will. They're trying to retain the Wiz brand and continue supporting multi cloud environments including aws, Azure and Oracle cloud platform. Alongside Google Cloud. Wiz connects code, cloud and runtime into a signal context, allowing security teams to map application architecture permissions, data flows and runtime behavior in real time to identify and prioritize exploitable attack paths before they reach production. The combined offering integrates Wiz's cloud security platform with Google Security Operations, Mandia Consulting and Google Threat Intelligence under the Google Unified Security umbrella, with Gemini AI assisting in threat hunting, remediation, workflows and audit documentation. Notable focus of the Actor version is AI specific security addressing threats that target AI models and those generated by AI systems, which is increasingly relevant as organizations deploy AI agents fed with their business data. Pricing details for the combined platform have not yet been announced, but Wiz products remain available through existing partner channels, system integrators and managed security systems service providers. And typically on these acquisitions it takes about a year for Google to figure out how to package them properly. And most likely they'll want a separate contract for it anyways because that's how they seem. Like all the integration acquisitions they've done are. [00:37:08] Speaker B: Well, it is interesting that they they mentioned that this unified security umbrella because that's something they announced at last re or not. The last Google Next conference, you know, in my opinion was a way to lump it all together into a really expensive package. It is kind of interesting. Like there's a lot of competing functionality within Wiz with their existing security platform, a lot of the Mandian stuff that they ingested in. So it's kind of interesting, you know, I wonder how that's all going to fit in. I know there's definitely some features in Wiz that I'm excited to try that I haven't been able to play with because Wiz before the acquisition was not cheap, so. [00:37:46] Speaker C: No, but it was. It's a good product And I think the context is really valuable. So I get why they wanted it. It's just now a question of what do they do with it long term. Google Cloud Run now supports direct Identity Aware proxy integration in a general availability, allowing developers to enable IAP or Identity Aware proxy with a single UI click or the IAP flag in GCloud, eliminating the previous requirements to configure load balancers manually. IAP carries no additional cost beyond the standard Cloud Run charges, with limited exceptions noted in the pricing docs, IAP and Cloud Run. It supports enterprise authentication features including user and group identity policies, context or access controls based on IP geolocation and device status and workforce identity federation for external identity providers. [00:38:29] Speaker B: Yeah, this is a neat little feature that I don't know how widely known it is, but it's something I've been using for a while. This might change like the configuration of it. I can't remember if I've been using it attached directly to load balancer config or not, but it, you know, allows you to do cool things like it, you know, this is a way that you can restrict internal service APIs that you're hosting using Cloud Run to just internal users or, or you know, use it as a sort of authentication proxy where you're, you know, you're sort of authenticating using your Google credentials to access a service. And there's, you know, some neat things that you can configure as part of it. And so like I like that they have this kind of option. I think it also might allow you to host your cloud run API endpoint privately, be able to access that. But I don't know about that. That seems a little rough. But I know you can do that with like compute VMs, you can, you can tunnel through them using the identityware proxy commands in GCloud. [00:39:25] Speaker A: That's cool. I'm sure I'll have some use for this, but I, I don't have one yet. [00:39:30] Speaker B: I mean it's proxy, of course. You'll let me swear. [00:39:36] Speaker A: Don't get me started. Yeah, that's interesting. Well, be kind of cool if you like pass through the user's auth to other things as well. But I don't think that's. [00:39:48] Speaker C: That would be cool. [00:39:49] Speaker A: I was having a conversation today about that. It was like, well, you know, I need a Service account for GitHub and all these other external services. I'm like, sure, but then you need to now know what your calling users have permission to in those services so that you don't act on their behalf to do something they weren't supposed to do in the first place. So I think some kind of seamless end to end run as a particular user throughout the whole process would be kind of cool. [00:40:12] Speaker B: Yeah, I mean you can kind of do that with workload identity today, but it's, you're, you're stringing together a lot of little different points to make that work and then validate all the claims and attributes that you want. [00:40:24] Speaker A: Yep. [00:40:26] Speaker C: Google Cloud has launched a preview of multicluster GKE inference gateway which extends the existing GKE gateway API to enable model aware load balancing for AI inference workloads across multiple GKE clust and regions. This addresses practical limitations of single cluster deployments like GPU and TPU capacity caps and regional availability risks. System introduces two core kubernetes, custom resources, the Inference Pool and the Inference Objective, which group model server backends and define routing priorities respectively. This allows the gateway to intelligently multiplex latency sensitive and lower priority inference requests across a distributed fleet. A notable technical capability is the GCP backend policy resource which enables load balancing decisions based on real time custom metrics such as KV cache utilization. On model servers, there is more inference specific than traditional request count or latency based routing approaches. Architecture uses a dedicated config cluster to manage a single gateway configuration that routes traffic to multiple target clusters, simplifying operations for teams running globally distributed AI services. [00:41:23] Speaker B: Yeah, simplify, sure. [00:41:26] Speaker C: It's Kubernetes. It can't be that simple, right? [00:41:28] Speaker A: Yeah, this is why I don't like Kubernetes though. It's like a shadow cloud. A shallow set of cloud services is like, okay, so this is just a load balancer, which is, which is aware of the utilization at a more detailed level than a regular load balancer. Okay. But now it launched it as a, as a feature of the platform which runs the containers and not a separate thing. I don't know, it just bugs the hell outta me. [00:41:52] Speaker B: Yeah, and it's, it's kind of this weird thing where it's just like there's so much configuration that's defined as, you [00:41:59] Speaker C: know, these loose services. [00:42:00] Speaker B: And so this allows you to do, you know, the backend policy to node pools on the back end. But then you know, all of the pod, you know, scheduling and stuff that you have to do. Like it's just everything is turned complex in the system and it's, you know, it'll do a little bit of everything, which is great. Then you have to do everything to make it work. [00:42:17] Speaker A: Yeah. I feel like it's having an identity crisis. It doesn't know where to stop. [00:42:23] Speaker B: It's kind of, kind of how I feel. And you know, maybe it's just because, you know, we're thinking about it as container orchestration, but. And people don't think about it that way. [00:42:30] Speaker C: But yeah, I don't think they do. [00:42:32] Speaker B: Mostly I just want Container orchestra. [00:42:37] Speaker C: Just mostly wanted containers to work. That'd be great. [00:42:39] Speaker A: Yeah, but it's not even great at that. [00:42:44] Speaker C: It's very true. Very true. Google AI Studio now supports project spend caps letting developers set monthly dollar limits per project directly from the spend tab. There is a roughly 10 minute enforcement delay so users remain responsible for any overages incurred during that window. Usage tiers have been redesigned with lower spend qualifications, automatic tier upgrades based on payment history and system defined billing accounts for caps that increase as you move to higher tiers, reducing manual intervention for developers scaling their API usage over time. There's three new dashboards that have been added to Google AI Studio covering rate limits, costs and usage and the rate limit dashboard tracks rpm, TPM and RPD per project. While the cost dashboard offers a daily breakdown filterable by model and time range going back up to a full month. If you've ever tried to figure out who is using what models and what they're doing with them and how much it costs, you know that this is all terrible and this doesn't actually improve it all that much. [00:43:32] Speaker B: Well, an AI studio is another layer on top of that, right. Which is all their documentation are like get production running up and you know, instantly the whole thing and then come to find out it doesn't meet all the enterprise features that you get at, you know, in Google Cloud and it's all different and all their documentation is like use static keys for everywhere, everything and put them anywhere. Who cares? Like I, I hate this so much. [00:43:57] Speaker A: Yeah, it's, it's, it's like 10 year old Google. It's a attract developer. This with the easy easy button that doesn't have the features, doesn't have security. Yeah, you know what, can I put an issue tracker link in the show notes to the issue tracker around Vertex API reporting. I think maybe I should, I'd like everyone to click the link and start the issue. [00:44:22] Speaker B: Nice. Use the platform. [00:44:24] Speaker C: Evil. [00:44:24] Speaker A: I love it. Yeah, absolutely. [00:44:28] Speaker C: I, I do, I, I allow it. That's good. [00:44:31] Speaker A: Yeah, yeah. I mean spend caps ideal. Can it be per person or is it per project? Is it per, you know, is it per group? Like how does that work exactly? I, I Just want, I just want to be able to build this myself. And the, you know, the, the audit logs do not contain sufficient information to do this. So it sucks. So please click the link, start the issue. It's been open since last August. [00:44:53] Speaker B: Does Gemini API Studio allow like it's a team based. They have that kind of configuration or is it more like projects within a U for a single user? I can't remember how you know, do they pseudo release enterprise features? [00:45:07] Speaker A: I don't know. [00:45:09] Speaker C: I didn't think they do, but I [00:45:12] Speaker B: haven't played around with it recently. I played around with it when I first listened to or first learned about it and I was like, okay, this is not for production. [00:45:20] Speaker A: Yeah, that 10 minute enforcement delay, that's because they're using the same thing I'm trying to do, which is they're using IAM audit logging through Pub sub into BigQuery. [00:45:35] Speaker B: That's hilarious. [00:45:37] Speaker A: Yeah. [00:45:40] Speaker C: All right. Azure Azure S3 agent is now generally available as an AI powered operations tool designed to help teams diagnose incidents faster and automate response workflows with the goal of reducing, reducing downtime and manual operational work. The general available release introduces deep context gathering capabilities, meaning the agent can pull together relevant signals and telemetry during an incident, rather than requiring engineers to manually correlate data across multiple tools. This fits naturally into teams already using Azure Monitor, Application Insights and related observatory tooling, as the agent is positioned to work within the existing Azure operations workflows rather than requiring a separate platform. The primary target audience is operations and SRE teams until we can fire them and replace them with this AI who are looking to reduce the time between incident detection and resolution without adding headcount. Pricing details were not included in the announcement, so teams by this should check the Azure pricing page directly. [00:46:30] Speaker A: All right, so they run the services which are going to have problems and now they want me to pay for another service so that I can use that tool to troubleshoot the problems with the other tools that I'm already paying for. Okay, all right. [00:46:44] Speaker B: Yeah, makes perfect sense. I mean, assuming you never introduce your own problems, but never. Which I would never. Yeah, I mean I, I like these, but only as like tools for existing SREs. Like it's just one of those things where it's like, I think, you know, much like we're seeing in Amazon, like there's just. And there's a lot of contextual information that you have to sort of carry in order to maintain an excellent service. And I don't think AI is there yet. I Think AI. I think this would be a great tool for an existing SRE to use for gathering data and insights and be able to focus them into a specific area to look in. But I don't know, I know that if I was running like, you know, leadership, running a production workload, I would not trust this by itself. [00:47:33] Speaker A: Data gathering is one thing. Figuring things out? Nah, yeah, not yet. But this, this is obviously in response to Google's blog post a couple of weeks ago about using Gemini CLI for exactly the same thing. So. [00:47:44] Speaker C: Yeah, yeah, I mean, everyone is, everyone's making this bot like this is the number one agent I keep hearing people talk about is building SRE agents to help making troubleshooting easier. And you know, this is finally fulfilling the dream of all those AI ops companies that all failed miserably to actually not make anything better. So those companies I would not want to own stock in right now. [00:48:03] Speaker B: Yep. I mean, you know, I had a real world example where I used to one of these things and it did point out, you know, the relevant application logs that were showing up to illustrate a pattern, but it wouldn't have had any ability to fix it. It's in, you know, like, it's one of those things where it's like, does it know the CICD pipeline where this code is? How does it, how do you cobble all this together? Right. Like, and it's, it's, you know, it can be kind of complex and so like it was helpful but it, it couldn't have resolved the issue that I found on its own. [00:48:34] Speaker A: Yeah, no, I think, I think you're right about context though. I think having a broader understanding of the situation around incidents that happen and even the data that you collect is going to be really important because even with really good coding tools, even with claude, I still have these weird kind of back and forths because I have one agent that's okay now. Analyze the code, do a security scan, make sure we're not following these bad patterns and it will come with 10 findings. And then I pass it back to the CLAUDE agent that's actually doing the work. And it's like, that's bullshit. Yeah, these things are justified because xyz, you know, it's documented here. But of course the security agent didn't, didn't read those docs. [00:49:12] Speaker B: That's exactly what my use case was actually. I had a deployment agent that was handling the Kubernetes stuff and it actually, it removed, it removed a thing that caused the application to crash on Start. [00:49:23] Speaker C: So it was like that's exactly what happened. [00:49:27] Speaker A: Yep. Yeah. It's kind of funny. It's kind of funny having. Having them sort of be. What's the word? Like adversarial in a way. Like one that's. One that's checking the other one. [00:49:38] Speaker C: It's. [00:49:39] Speaker B: It's just like people. [00:49:41] Speaker C: Just like people. [00:49:43] Speaker A: Yeah. I haven't. I haven't asked them how much, how many fucks they give about these issues yet, but maybe I'll introduce them to that as well. [00:49:48] Speaker C: Yeah. [00:49:49] Speaker B: Nice. Nice. [00:49:53] Speaker C: Dedicated agent died. Okay. [00:49:56] Speaker A: All right. [00:49:57] Speaker B: Gonna do that. [00:49:58] Speaker C: Yeah. Azure is announcing two new public preview offerings, the Azure Copilot Migration agent and the GitHub copilot modernization agent. Designed to automate discovery, assessment, planning and deployment organizations. Moving workloads to Azure Migration Agent targets servers, virtual machines, applications and databases, while the modernization agent orchestrates code upgrades that scale across multiple applications simultaneously. The dream of all cloud vendors for the last, you know, 10 years to make your migration easier and modernization faster and better now with AI. So use with caution. That's all I can say on this one. [00:50:30] Speaker B: I keep waiting for someone to tell the success and how they did it and they've. They've done. They've migrated all their terrible legacy code in this new thing and it all works. But I haven't seen that. [00:50:40] Speaker C: Yep. [00:50:41] Speaker A: Yeah, it reminds me like five years ago when it was everything is moving from VMware on prem every episode was like a new tool to migrate your VMs. Live migrations, not live migrations. [00:50:52] Speaker B: Like. [00:50:52] Speaker A: Okay, I'll be back to that again. [00:50:54] Speaker B: Yeah, Babelfish in AWS like that transparent [00:50:59] Speaker A: layer that didn't go anywhere. [00:51:01] Speaker C: Nope. [00:51:03] Speaker B: There's a lot of. I mean it's just a big problem that's hard to solve and I think that everyone's trying to solve it without a real solution, which is dedicate time and money to do it the right way and no one wants to do that. [00:51:15] Speaker A: Yeah, there is no easy button.com that's what we need [00:51:20] Speaker C: in a service I've never heard of. Microsoft Foundry now integrates with Fireworks AI Inference Cloud, giving customers access to models like Deep Seq version 3.2, Kimi K25 and OpenAI's GPT OSS120B through the both paper token and provision throughput deployment options as currently in public preview and requires an opt in through Azure's Portal Preview features panel pricing files per million model tokens for serverless deployments covering input, cached input and output tokens. More interesting is this line in the article where basically Fireworks hosted Models are distinct from Azure Direct models and that they skip Microsoft's responsible AI safety assessment. So teams needing safety evaluations will need to use Foundry's built in risk and safety evaluator tools separately. Or if you're a department of war, just use this. Sounds great. Yeah. [00:52:05] Speaker B: Oh man. Like, I don't understand how running, running inference on, you know, like a foreign workload makes a lot of sense. Like I guess maybe pricing because performance can't be great. [00:52:19] Speaker C: Yeah, I mean like it sounds like it's a cross connect that they've done to Fireworks Cloud to basically provide this to you. So it's, it's sort of interesting. I don't, I don't fully know why I would want to use this. And like if a customer, you know, if a listener knows or has a use case for this, everything, it's amazing. I'd love to hear about it because I don't quite get it. I mean, I'm not saying Fireworks is bad. I think it's probably just fine. But it's just like Core Weave or any of the other, you know, purpose built AI cloud providers that's out there and the partnership is a way for Azure to get more capacity. So that makes sense. But then, but there's so many rough edges to do it. [00:52:52] Speaker A: They're probably running on it Azure though. And it could be the crazy thing, to me it feels like it's a. Let's keep this workload at arm's length. I like having this abstraction between Microsoft and the workload so that if anything happens, it actually wasn't us, it was Foundry or it was a firework or whatever. It's whatever they're called. Yeah. I feel like it's a deliberate abstraction for legal reasons. [00:53:15] Speaker B: Interesting. I mean. [00:53:18] Speaker C: Yeah. [00:53:20] Speaker B: Bypassing the safety things, like it's. I don't know, the safety tools are there, they're not foolproof. You still have to include all that logic in your. And whatever's, you know, reading and reading the prompt and we're interacting with LLM but you know, like it's such a easy button to put in place that it's funny that it's. I'm sure it's just preferred performance, but I just don't understand it. [00:53:44] Speaker C: Yeah. In our last Azure story, Microsoft is reorganizing its copilot efforts by merging consumer and commercial teams into a single unified org structured around four pillars. Copilot Experience, Copilot Platform, Microsoft 365 apps and AI models. And Jacob Andro will lead the combined copilot Experience as evp reporting directly to Satya Nadella. Mustafa Silliman, who we talked about before, is shifting focus exclusively to what Microsoft calls its superintelligence effort, concentrating on frontier model development, enterprise tuned model lineages and reducing inference cost at scale over the next five years. Restructuring reflects a product direction where Copilot moves from individual to features towards an integrated system connecting agents, apps and workflows. With recent announcements like Copilot Tasks, CoPilot cowork and Agent 365 representing early examples of the approach for enterprise customers. The key practical implications that commercial and consumer Copilot capabilities will converge, meaning IT and governance controls will need to account for more unified product surface rather than separate consumer and business tracks. So in general, I mean, they gotta do something because Copilot is a mess. And the fact that they even called out Copilot Cowork in here is sort of funny to me because none of these people had anything to do with that other than, you know, slapping a Microsoft logo on top of Anthropic's product. But you know, definitely Copilot should be much further along than it is right now. So I, I approach, I, I think the focus and change is definitely necessary. [00:55:01] Speaker B: Yeah. I wonder, and I wonder, like I, it's notably missing, which is GitHub's copilot, [00:55:06] Speaker C: which is what I use all the time. [00:55:07] Speaker B: So it's like, and you know, like I've tried many times to use the, the office and 365 agent or copilot completely failed every time. [00:55:21] Speaker A: It's like having a conversation with somebody who has no memory. Yeah. Ask you to do something and they ask you to change what it's done and it's, and it does not, it has no idea. Yeah, yeah, it's, it's, it's. Their choices of where to implement Copilot have been really, really bizarre. Like, you know, you get a brand new Windows 11 laptop and you open up Ms. Paint and there's Copilot right there. Like, what the hell. [00:55:42] Speaker C: Well, and then they're introducing bugs into services like Notepad that haven't been there for decades, you know, having been changed with adding AI features or. Yeah. Added a bug or a, you know, security vulnerability. Cool. Thank you for that. I appreciate it. I guess. Yeah. It is sort of weird where they put it in general. [00:55:59] Speaker B: I mean, it does make sense the, the leadership change. Right. That's not going well. I think it's good to shake it up. Hopefully. Jacob Van Drio. Hopefully I'm not butchering that. [00:56:09] Speaker A: Too bad. [00:56:09] Speaker B: You know, has the right direction. So we'll see. [00:56:12] Speaker C: Yeah, I hope so. The fact that he's answering to Sacha is also a good sign too. Means that, you know, that'll help a lot. He'll. [00:56:17] Speaker A: What do you want to see from Copilot? What are you expecting? [00:56:20] Speaker C: I wanted to see them have their own version of Cowork. Like that's, that's. They should have had, that's deeply embedded into Microsoft products. So, you know, the ability to do PowerPoint and create slides and use a template and things like, you know, Claude Power for PowerPoint can do or the Excel tool that Cloud has, those things are awesome. Like, I do all kinds of cool visualizations in Excel now that I wouldn't have spent the time on. Like, it's just giving it, hey, look at this data in this chart and come up with like 10 ways to show the data that might be interesting to me. And Claude, like, produces 10 charts. I'm like, wow, that like, sometimes they're dumb. Like, I won't say they're all great, but I mean, like, I will tell you, there's been at least twice now where it's produced a chart works. I was like, I had not considered looking at the data that way, but that's a really interesting perspective of it. [00:57:02] Speaker B: Yeah. And just the trivialness of, of that request and the degeneration of it. [00:57:06] Speaker C: Right. [00:57:07] Speaker B: Allows you to go through a lot more iterations or a lot more different ideas, which I think is great. You know, like I, you know, a cowork thing or something that's more integrated at the OS level for Microsoft would be neat. Something that's, you know, across. Or their Office365 tenants and maintains context from tool to tool would be nice, but I mean, hell, I'd settle for just better agent responses within the individual tools. Because right now, like you said, like, it's just the performance is not there. It's context of where it's running and what you're talking about is not there. It's just not as much. It's just not as useful as, you know, cloud code or GitHub copilot to me. Yep. [00:57:50] Speaker C: I mean, even Gemini, if you're in the Google workspace, has more capabilities than Copilot does. So. All right, gentlemen, we made it to another end of the show. We do have an after show today, so stay tuned if you want to hear us talk about that. But otherwise, we'll see you next week [00:58:04] Speaker B: here in the Cloud. [00:58:05] Speaker A: See you later. [00:58:06] Speaker B: Bye, everybody. [00:58:10] Speaker A: And that's all for this Week in Cloud. Head over to our [email protected] where you can subscribe to our newsletter, join our Slack community, send us your feedback and ask any questions you might have. Thanks for listening and we'll catch you on the next episode. [00:58:36] Speaker C: All right, I in my, my RSS feeder this week this article came up and I thought it was hilarious. So Washington State Department of Licensing apparently routed Spanish language callers to an AI voice speaking English with a Spanish accent for several months, a direct result of a misconfiguration by Department of Licensing staff using Amazon Web Services. Polly AV journalists were able to replicate this issue by selecting the AWS Poly voice named Lucia, which is designed to mimic Castilian Spanish, highlighting how easy it is to misconfigure AI voice services when teams lack familiar with underlying platform options. Incident is a practical reminder that deploying ad driven customer service tools across multiple languages requires thorough testing. Yes, and the example that was in the article that I laughed the most about was there was some quote. Basically it was like it was trying to say something around like the number three, but if said the number Trey and I'm like, wow, that's, that's really, really helpful. Oh yeah, your estimated wait time is less than Trey minutes. And I'm just like, oh man, that's so bad. Swing and a miss. Yeah. [00:59:42] Speaker B: I mean, my mind immediately went to like the most racist stereotype that I could think of. Right? [00:59:46] Speaker C: For like, I mean, why does Poly even have an English with heavy Spanish accent as an option? Why is that even a thing I don't understand? [00:59:56] Speaker A: Yeah, yeah, because it's, because it's not, it's not trained on individual syllables like it used to be. It's, it's like the shape of the voice. You can, you could probably have it speak any language in any accent if you really wanted to. It's just. But yeah, that's. That's hilarious. [01:00:10] Speaker B: Maybe it's like so you could code your robot as your romantic love interest, you know, with the Castilian Spanish accent. I can see that being sexy. [01:00:19] Speaker C: I mean, I'm starting to really enjoy some of the AI bloopers that come out. I mean, originally it was just images, you know, where like they were drawing six hand, you know, six fin fingers or 12 fingers on a person's hand. Like that was, that was fun. But we kind of moved past that era now with AI generated images. So now like, now they're just dumb implementations. Like, oh, I implemented AI in a terrible way. Like this, this is, this is a. You Know, and again, like, yeah, make sure you know what you're doing and make sure you test this stuff with probably native language speakers of each language you're trying to implement. That way you don't run into these problems or find another AI service that can actually QA it properly because, like, I know I, I can't speak Japanese or read it. So, you know, I'm. We're completely dependent on people in Japan to tell me, you know, when something's bad or good or whatever. And so those are areas you should. You could be using AI, but again, also the AI could lie to you too, which we see all the time. Yeah, it's. [01:01:07] Speaker B: It's multi layered approach. I mean, I do feel bad for whoever is in Washington State Department of Licensing, because I'm sure they're not very technical. [01:01:14] Speaker C: Right. [01:01:14] Speaker B: And they're trying to use these, these tools and may just didn't understand it. [01:01:18] Speaker C: I mean, they might have also been said, set this up in 10 minutes and they, you know, they threw it out and, you know, or they just misclicked something because maybe, maybe Spanish with accent is right next to Spanish for Spanish. Right. I don't know. And so it wasn't, it wasn't, you know, any ill intent or, or viciousness, but, you know, it's just one example of a terrible AI story in the news. [01:01:36] Speaker A: So, I mean, you don't even want Castilian Spanish here, really, do you? That's not, it's not Mexican Spanish. Not Mexican Spanish either. [01:01:45] Speaker B: So it's not. [01:01:46] Speaker C: No, no. It's a definitely a very unique, you know, version of Spanish. I mean, Americans, at least. It's not unique in Castilia. Yeah. Or Castellan or whatever that is. [01:01:55] Speaker B: But it's not even, you know, like it's a specific region of Spain, right? [01:01:59] Speaker C: Yeah, yeah, correct. [01:02:00] Speaker A: Have you been to Barcelona? [01:02:02] Speaker C: I have been to Barcelona, which I still can't get over that one. That one drives me crazy every time. I know it's Barcelona, but I can't. I. Barcelona comes out so easily to. Right, gentlemen, well, let's keep an eye out for some more fun AI bloopers that are out there, that aren't just images because they're fun to talk about and they're cautionary tales to our listeners. All right, gentlemen, have a great one. Bye.

Show Notes

Titles we almost went with this week

Follow Up

General News

AI Is Going Great – Or How ML Makes Money

AWS

GCP

Azure

After Show

Closing

Chapters

Episode Transcript

Other Episodes

Episode 76

Episode 76: IBM Blames Cloud Pod for Outages

Episode 226

226: Duet, Co-Pilot, and a Code Whisperer Walk into a bar in San Francisco

Episode 303

303: Someday You Will Find Me, Caught Beneath the AI Landslide, in a Champagne Premier Nova in The Sky