322: Did OpenAI and Microsoft Break Up? It’s Complicated…

Episode 322 September 24, 2025 01:23:24
322: Did OpenAI and Microsoft Break Up? It’s Complicated…
tcp.fm
322: Did OpenAI and Microsoft Break Up? It’s Complicated…

Sep 24 2025 | 01:23:24

/

Hosted By

Jonathan Baker Justin Brodley Matthew Kohn Ryan Lucas

Show Notes

Welcome to episode 322 of The Cloud Pod, where the forecast is always cloudy! We have BIG NEWS – Jonathan is back! He’s joined in the studio by Justin and Ryan to bring you all the latest in cloud and AI news, including ongoing drama in the Microsoft/OpenAI drama, saying goodbye to data transfer fees (in the EU), M4 Power, and more. Let’s get started!  

Titles we almost went with this week

A big thanks to this week’s sponsor:

We’re sponsorless! Want to get your brand, company, or service in front of a very enthusiastic group of cloud news seekers? You’ve come to the right place! Send us an email or hit us up on our Slack channel for more info.

AI Is Going Great – Or How ML Makes Money 

01:33 Microsoft and OpenAI make a deal: Reading between the lines of their secretive new agreement – GeekWire

ALSO:

OpenAI and Microsoft sign preliminary deal to revise partnership terms – Ars Technica

02:59 Justin – “I’m not convinced that we can get to true AGI with the way that we’re building these models. I think there’s things that could lead us to breakthroughs that would get us to AGI, but the transformer model, and the way we do this, and predictive text, is not AGI. As good as you can be at predicting things, doesn’t mean you can have conscious thought.” 

07:45 Introducing Upgrades to Codex

10:14 Jonathan – “I think Codex is probably better at some classes of coding. I think it’s great at React; you want to build a UI, use Codex and use OpenAI stuff. You want to build a backend app written in C or Python or something else? I’d use Claude Code. There seem to be different focuses.”

13:24 How people are using ChatGPT

14:51 Jonathan – “I wish it was more detailed; like how many people are talking to it like it’s a person? How many people are doing nonsense (like on) Reddit?”

17:42 Introducing Stargate UK

18:19 Justin – “I mean, we already have a GPU shortage, so to now make a regionalized need for AI is going to further strain the GPU capacity issues, and so I should probably buy some Nvidia stuff.”

AWS

19:37 Announcing Amazon EC2 M4 and M4 Pro Mac instances | AWS News Blog

22:00 Accelerate serverless testing with LocalStack integration in VS Code IDE | AWS News Blog

23:05 Ryan – “It’s interesting; it’s one of those things where I’ve been able to deal with the complexity, so didn’t realize the size of the gap, but I can see how a developer, without infrastructure knowledge, might struggle a little bit.” 

26:38 Amazon EC2 supports detailed performance stats on all NVMe local volumes

New EFA metrics for improved observability of AWS networking

27:37 Jonathan – “That’s cool, it’s great that it’s local and it’s not through CloudWatch at .50 cents a metric per however long.”

28:19 Now generally available: Amazon EC2 R8gn instances

29:18 Jonathan – “That’s what you need for VLM clustering across multiple machines. That’s fantastic.”

29:55 Introducing AWS CDK Refactor (Preview)

30:56 Ryan – “It’s interesting, I want to see – because how it works is key, right? Because in Terraform, you can do this, it’s just clunky and hard. And so I’m hoping that this is a little smoother. I don’t use CDK enough to really know how it structures.”

31:36 AWS launches CloudTrail MCP Server for enhanced security analysis

32:23 Ryan – “This is fantastic, just because it’s so tricky to sort of structure queries in whatever SQL language to get the data you want. And being able to phrase things in natural language has really made security operations just completely simpler.”

GCP

36:35 New for the U.K. and EU: No-cost, multicloud Data Transfer Essentials | Google Cloud Blog

41:13 Kubernetes 1.34 is available on GKE! | Google Open Source Blog

42:57 Jonathan- “I like to think of it as fixing a problem with JSON, rather than fixing a problem with YAML, because what it looks like is JSON, but now you can have comments – inline comments, like you could always do with YAML.”

45:22 AI Inference recipe using NVIDIA Dynamo with AI Hypercomputer | Google Cloud Blog

46:52 Jonathan – “It’s just like any app, any monolith, where different parts of the monolith get used at different rates, or have different resource requirements. Do you scale the entire monolith up and then have wasted CPU or RAM on some of them? Or do you break it up into different components and optimize for each particular task? And that’s all they’re doing. It’s a pretty good idea.”

47:56 Data Science Agent now supports BigQuery ML, DataFrames, and Spark | Google Cloud Blog

48:52 Ryan – “This kind of makes me wonder what the data science agent did before this announcement…”

50:18 Introducing DNS Armor to mitigate domain name system risks | Google Cloud Blog

51:16 Ryan – “This is cool. This is one of the harder problems to solve in security is just that there’s so many services where you have to populate DNS entries and then to route traffic to them. And then it can basically be abandoned over time in bit rot. And so then, it can be snatched up by someone else and then abused; this will help you detect that scenario.”

53:13 Announcing Agent Payments Protocol (AP2) | Google Cloud Blog

54:26 Jonathan – “This may be the path to the micro payments thing that people have been trying to get off the ground for years. You run a blog or something, and something like this could actually get you the half cent per view that would cover the cost of the server or something.”

55:56 C4A Axion processors for AlloyDB now GA | Google Cloud Blog

58:04 OpenTelemetry now in Google Cloud Observability | Google Cloud Blog

1:00:41 Our new Waltham Cross data center is part of our two-year, £5 billion investment to help power the UK’s AI economy.

1:01:31 Justin – “The Deep Mind AI research is the most obvious reason why they did this.” 

1:02:22 Announcing the new Practical Guide to Data Science on Google Cloud | Google Cloud Blog

1:04:29 Google releases VaultGemma, its first privacy-preserving LLM – Ars Technica

1:05:36 Justin – “You want to train a model based off of sensitive data, and then you want to offer the output of that model through a chatbot or whatever it is publicly. And it’s terrifying, as a security professional, because you don’t know what data is going to be spit out, and you can’t predict it, and it’s very hard to analyze within the model what’s in there… And so if solutions like this, where you can sort of have mathematical guarantees – or at least something you can point at, that would go a long way in making those workloads a reality, which is fantastic.”

Azure

1:08:20 Generally Available: Azure Cosmos DB for MongoDB (vCore) encryption with customer-managed key 

1:09:31 Ryan – “I do like these models, but I do think it should be used sparingly – because I don’t think there’s a whole lot of advantage of bringing your own key… because you can revoke the key and then Azure can’t edit your data, and it feels like an unwarranted layer of protection.” 

1:14:57 Introducing Logic Apps MCP servers (Public Preview) | Microsoft Community Hub

1:16:04 Ryan – “For me, the real value in this is that central catalog. The minute MCP was out there, people were standing up their own MCP servers and building their own agents, and then it was duplicative, and so you’ve got every team basically running their own server doing the exact same thing. And now you get the efficiency of centralizing that through a catalog. Also, you don’t have to redo all the work that’s involved with that. There’s efficiency there as well.”

1:17:13 Accelerating AI and databases with Azure Container Storage, now 7 times faster and open source | Microsoft Azure Blog

1:19:17 Microsoft leads shift beyond data unification to organization, delivering next-gen AI readiness with new Microsoft Fabric capabilities

1:20:35 Justin – “The fabric stuff is interesting because it’s basically just a ton of stuff, like Power BI and the Data Lake and stuff, shoved into one unified platform, which is nice, and it makes it easier to do data processes. So I don’t expect it to be a major cost increase for customers who are already using fabric.”

Oracle

1:21:40 Oracle’s stock makes biggest single-day gain in 26 years on huge cloud revenue projections – SiliconANGLE

Closing

And that is the week in the cloud! Visit our website, the home of the Cloud Pod, where you can join our newsletter, Slack team, send feedback, or ask questions at theCloudPod.net or tweet at us with the hashtag #theCloudPod

Chapters

View Full Transcript

Episode Transcript

[00:00:00] Speaker A: Foreign. [00:00:06] Speaker B: Welcome to the Cloud pod where the forecast is always cloudy. We talk weekly about all things aws, GCP and Azure. [00:00:14] Speaker C: We are your hosts, Justin, Jonathan, Ryan and Matthew. [00:00:18] Speaker A: Episode 322 for the week of September 16, 2025. Did OpenAI and Microsoft break up? Well, it's complicated. Good evening, Ryan and Jonathan. How are you guys doing? [00:00:30] Speaker C: Hello. [00:00:31] Speaker B: Doing good. [00:00:32] Speaker A: Well, good. Well, we are knee deep in a lot of news this week, so we're going to jump right into it today and we're going to start out with Microsoft and OpenAI. After months of will they or won't they, they finally signed a non binding memorandum of understanding that will restructure their partnership. OpenAI's nonprofit entity receiving an equity stake exceeding 100 billion in the new Public Benefit Corporation where Microsoft will play a major role. The deal addresses the AGI clause that previously allowed OpenAI to unilaterally dissolve the partnership upon achieving artificial General Intelligence intelligence, which had been a significant risk for Microsoft's multi billion dollar investment. Both companies are diversifying their partnerships. Microsoft is now using anthropic technology for some Office 365 AI features, while OpenAI has signed a 300 billion computing contract with Oracle over five years. Microsoft's exclusivity on OpenAI cloud workloads has been replaced with the right of first refusal, enabling OpenAI to participate in the 500 billion Stargate AI project. And the restructuring allows OpenAI to raise capital for its mission while ensuring the nonprofit's resources grow proportionately. With plans to use funds for community impact, including a recent launch of a $50 million grant program. [00:01:44] Speaker C: Nutso. [00:01:46] Speaker A: I mean, we knew it was gonna have to come to some conclusion and this one seems the most relevant. [00:01:51] Speaker C: I mean, it's a, seems like the most compromise, but it's, there's a couple of weird ones in there that, you know, freak me out. Like the AGI clause just scares me because I, I'm sort of dreading when we actually reach AGI. Like, I want to see it, I'm curious, but I want, I mean, I. [00:02:07] Speaker A: Think I talked about it before. I'm not convinced that we can get to true AGI with the way that we're building these models. I think there's things that could lead us to breakthroughs that would get us to AGI, but I like the transformer model and the way we do this. And predictive tax is not a, is not AGI. So, you know, as good as you can be at predicting things, doesn't mean that you can, you know, have conscious thought and so I don't, I just. [00:02:33] Speaker C: Don'T, I don't know that everyone's developing AGI towards in the same way. [00:02:38] Speaker A: I mean I don't know what they're developing any. They've all been very quiet about what they mean by AGI, but I just, just seeing the differences between, you know, OpenAI 4 and 5 and Klaud, you know, their different versions of Sonnet, like the, the reality is, is that the exponential improvement to the AI is slowing. And so, you know, is this really the platform that's going to get them to AGI? I don't know. Now there are some interesting use cases where they've said, you know, AI talking to other AI has started creating its own languages and other things. And like, I don't know if I buy all that. Like it's just, you know, there's weird edge cases but then when they try to reproduce those edge cases they don't ever pan out. [00:03:14] Speaker B: Yeah, there's a whole lot of scam mongering and fake videos and things on YouTube and that AI is using their invented language. I mean it's just, it's basically how modems work. It's frequency shift king. They were instructed to use that if, if you know, in the circumstances, if they were talking to another AI and it's, it's not particularly more optimal. But yeah, it's all just fake. [00:03:34] Speaker C: Well and the being instructed to do a thing is key there. Right, because that's what everyone freaks out is that they don't think it was instruct and, and so it's like it did this by itself and it did. [00:03:44] Speaker A: Well and as you get into more and more obscure, you know, sources of training data, you're going to get into like dead languages and you're going to get into all kinds of things that, you know, if it can determine pattern based recognition inside of those things or there's enough documentation on them, you know, it could speak Latin probably really well where most Americans would not be able to pass that. [00:04:02] Speaker C: I mean, I'm not sure I fully can speak English. [00:04:07] Speaker A: So I can just, I can just pretty much, you know, remember the lorem ipsum part of what we use in all of the qa. [00:04:14] Speaker C: That's what I know too. [00:04:15] Speaker A: That's pretty much the limitation of my, my knowledge of Latin. And of course the ones that we, you know, plaster all of our money and military sluggish. I know those ones. That's about it. Well, glad to see that OpenAI and Microsoft are, are moving forward. I still suspect that this is also Microsoft's desire to, I think they want to create their own LLM and being in a tight partnership makes that harder to do. And this allows them now to provide, you know, access to multiple models as well as develop something more commercial for them as a foundational model, which I think they, you know, they've shown in their open source models. They have definitely an interest in doing something. And just now I think they can actually do something commercially too. [00:04:57] Speaker C: Yeah, I mean, I think Microsoft is kind of letting OpenAI off the hook because all it's enabling all these announcements that OpenAI is already made. So it's sort of. [00:05:08] Speaker A: But. [00:05:08] Speaker C: And they're just let. They're getting some, you know, room to maneuver out of it, which is great, but it's still sort of with that, with their original investment, it's kind of a nutso move. [00:05:20] Speaker B: I mean, it might just be the least worst option for them because if OpenAI clearly needs money, they're burning through money like crazy. If OpenAI were to fail because they didn't have investment, then Microsoft would lose their original investment. So it's like if they don't go down this path, there's a chance that OpenAI disappears and they've lost their billions of dollars anyway. So it's. [00:05:40] Speaker A: Yeah, well, and also, you know, you're. Microsoft's building a product for a single customer, OpenAI, which is not ideal for hyperscalers. A lot of risk in them. And so if they fail for other reasons, that's also a risk not only to their investment, but also to their cloud business. And so, you know, if you were telling Azure you had to go invest another $500 billion in data centers dedicated to the single customer, I'm not sure that's super exciting for them either. So now the risk gets distributed out to other companies like Oracle and other investors who want to get in on the action, but, you know, want to take outsized amount of risk that Azure didn't want to do. [00:06:14] Speaker B: Kind of raises the question as to why they would have even agreed to this AGI clause in the first place. [00:06:20] Speaker A: Probably because they thought it was pie in the sky. Back before this, when Sam Albin's in your office saying, hey, you can do this thing, but if we get AGI, the deal's off the table. Table. You know, they're like, yeah, yeah, whatever, man. Like science fiction talk here. So now the reality changed. And so, you know, if it's possible, then, you know, maybe we'll see it hopefully in a couple years or not. [00:06:45] Speaker C: I think the lawyers just caught it finally that they didn't understand what AGI. [00:06:48] Speaker A: That was the first time that could be too. They were like, that's. That must be some weird, you know. [00:06:52] Speaker C: Chip or some AI acronym. [00:06:55] Speaker A: Yeah, well on the heels of this, OpenAI has announced a bunch of things. So first of all, Chat GPT5 launched a few weeks ago, months ago. I don't. Time is a weird construct in my memory but basically one of the things that people were complaining about the new version was that it wasn't really great at coding tasks. And so they have actually now released a ChatGPT code model and upgraded codecs to better translate natural language into code. With improvements in handling complex programming tasks, edge cases and expanded multi language support, this enhances developer productivity in cloud native applications where rapid prototyping and automation are essential. Architecture changes and train data updates enable more accurate code generation, which could reduce development time for cloud infrastructure. Enhanced codecs capabilities directly benefit cloud developers by automating repetitive coding tasks like writing boilerplate code for cloud service integrations, database queries and deployment configurations. The improved edge case handling makes codecs more reliable for production use cases, potentially enabling automated code generation for cloud monitoring scripts, data pipeline creation and infrastructure as code templates. These upgrades position codecs as a practical tool for accelerating your cloud application development. And I, I'm sort of offended by the fact that you think my production environment is edge case. [00:08:09] Speaker C: Whereas writing code for a production environment. Edge case, right. [00:08:12] Speaker A: Yeah, like how do I interpret that my production's not that impressive. [00:08:17] Speaker C: I mean I do, I find it kind of funny because like AI is going to. Wait is going to be how we actually get to true DevOps is, is basically by developers just basically shipping it to AI writing so. [00:08:27] Speaker A: Well, it's, it's funny because you know, we, we've recently changed the way we do DevOps in the day job and you know, we have developers asking like well how do I write this terraform code? I'm like ask chatgpt. It is true. [00:08:39] Speaker C: I say that so many times. Have you tried the, the AI? [00:08:43] Speaker A: Yeah, because like what you're trying to do is not rocket science most of the time. Like when you get into more complicated use cases and we talk about terraform modules and other things. So you're just trying to spin up a server and that's not hard to do. You just need a little bit of boilerplate. [00:08:56] Speaker B: Have you used Codex? [00:08:57] Speaker A: So I use it for the first time this week and I will tell you I thought it was not awesome. It's much slower than Claude code, which is my go to the code it generated was good though I will say like the amount of errors that occurred in the code was much less. So I. The quality was better but the speed is not there. That's my complaint. [00:09:17] Speaker C: I might be able to take the speed hit if the quality is better. [00:09:20] Speaker B: Yeah, exactly. And the thing, I mean I think it's. I think Codex is probably better at some classes of coding. Like I think it's great at react. You want to build a UI, use codecs and use OpenAI stuff. You want to build a backend app written in C or Python or something else. I'd use Claude code that seems to be different focuses. [00:09:42] Speaker A: My use case was PHP front end code that I was trying to debug JavaScript with. I know, yeah, so I said too. And Claude definitely has been doing a better job actually to be fair, Gemini also does a really good job at JavaScript in general. [00:09:55] Speaker C: I think they all do. I think there's much more training data. That's my guess anyway. [00:09:59] Speaker A: Well, the problem is it's not hard to get the JavaScript code right. It's hard getting the JavaScript to work with PHP. That's the complicated part. [00:10:08] Speaker C: Is that why I could never get PHP things to work? It was so bad. I don't even know how, how bad I got. Like I was in a hole and I didn't know how to get out with PHP every time. [00:10:18] Speaker A: Let me tell you, I have learned a lot about PHP in the last two months and I, I do think it may be the worst language ever written. [00:10:27] Speaker B: How is this so popular? [00:10:28] Speaker A: I like, I do honestly do not. I mean like it's, it's popular because it's, it's sort of like Ruby and Python where it's, it's kind of a jack of all trade. It can do anything but like the controller model and then the interactions between the controllers and like part of it's complicated by AIs, you know, trying to follow best practices on code that was clearly not written with best practices. But yeah, I've come to understand it much better and also hate it much more than I ever did before. [00:10:57] Speaker C: Well, that's, that's how you measure it. That's the better metric. How much do you hate a thing is how much you're familiar with it. I think it's very fair. [00:11:02] Speaker A: I mean I don't hate Ruby and I'm pretty damn familiar with Ruby code, so. [00:11:05] Speaker C: Yeah, but we've discussed this many times. That's a mental defect that I will never Quite understand. [00:11:10] Speaker A: That's fair. [00:11:11] Speaker B: I mean PHP got public because it's not Perl and so, you know, in context it's way better than the alternative. [00:11:16] Speaker A: Way better than Perl. So. [00:11:18] Speaker C: Right about that. [00:11:20] Speaker A: Yeah. I also like, I've been doing a lot more Python coding too and I'm actually really starting to enjoy Python. I see why all the Ruby people jump ship. I'm like, oh, this is, this makes sense. Okay, yeah, I get it. [00:11:32] Speaker B: It's just easy to read. I mean it's easy, but it's easy to read. [00:11:36] Speaker C: It's really hampering my abilities to, to adopt more, you know, more compiled languages. Like it's just much easier. [00:11:44] Speaker A: I have doing a little bit more, I've been dabbling more and Go. I have not done as much with it yet as I'd like to because I don't, I didn't. My two fun projects on the side were both done in Python. But I, you know, if I had done one of those in Go, which is what I should have done, then I maybe would be learning Go as well at the same time. But right now I'm just, you know, my Python's getting much, much better and I'm going to move that into using it more for ML data science and you know, future proof my career a little bit. So yeah, that's the whole. All the cool people are doing journal, you know, journals. [00:12:15] Speaker B: Yeah, I mean I kind of think the only future of career is fruit picking at this point. [00:12:20] Speaker A: I think being a really good prompt writer is going to be the future of your career at this point. [00:12:25] Speaker C: I'm banking on that because I'm too lazy to do fruit picking. Although I do agree, I think that's going to be the jobs that are left. [00:12:30] Speaker A: Yeah, uh, yeah. Well, OpenAI also has released a report how people are using ChatGPT. Anthropic actually came out with one slightly after that, but I did not get it into the show notes for this week so we'll cover it next week but focus on OpenAI's interpretation day. They're saying that ChatGPT usage patterns across diverse professional domains with significant adoption in software development, content creation, education and business operations demonstrate the technology is broad applicability beyond initial expectations. The data shows developers are using ChatGPT for CO generation, debugging and documentation tasks while educators leverage it for lesson planning and personalized learning experiences indicating practical integration into existing cloud based workflows. Business users report productivity gains for automated report generation, data analysis and customer service apps and usage patterns highlight the need for Cloud providers to optimize infrastructure for conversational AI workloads, including considerations for API rate limits, response latency and cost management for high volume applications. The findings underscore growing demand for AI powered tools in cloud environments. The most interesting thing was kind of the percentages when you get into the report in the full working paper. Basically engineering and using is only like 13% which I was kind of shocked at how low programming use cases were at the moment. [00:13:45] Speaker B: I wish it was a bit more detailed. Obviously you got to protect privacy, but I wish there was more detail. How many people are talking to it like it's a person. How many people are doing nonsense? You just got to look at Reddit and just see AI slop everywhere. All the science things, people are using AI to generate research papers, they're just complete nonsense. Yeah, there needs to be sort of away from that. [00:14:13] Speaker A: I will tell you that if you go and look at the actual report they wrote, they do talk a lot about privacy and how they generated this report and the mathematics behind it, which is nice. So it definitely is a well sourced study. But yeah, getting information, 19.3% of queries going to the thing 13.1% interpreting the meaning of information for others 12.8% documenting, recording information, providing, providing consultation advice to others 9.2% thinking creatively 9.1% and making decisions and solving problems 8.5% and then everything falls down pretty quickly after that. [00:14:52] Speaker C: Generating content for collaboration with others is just me putting it in, saying make me sound less like a joke. Right. [00:14:59] Speaker A: I, I also use it for like, I get weird things sometimes. I'm like, I don't know what that means and I put in there like, tell me what this could possibly mean in this context. And it goes, well, it could be this, this or that. And like, well that actually is helpful. So it's a long way to say. [00:15:13] Speaker C: You don't understand my emails. [00:15:16] Speaker B: You just have to hope that if there's an acronym it doesn't try and explain entirely the wrong thing and then he sends an email. [00:15:23] Speaker A: But you have an example we'll talk about later. [00:15:26] Speaker C: Oh, this is going to be good. [00:15:30] Speaker A: Yeah, so it's good to see this kind of data again. I'm kind of surprised in some ways and not surprised in others. But definitely we know that there's lots of companies been built on trying to make more conversational human interactions for, you know, one on one conversations. People are using it for therapy, although it's highly not recommended and you know, all kinds of other use cases that maybe were not expected. [00:15:52] Speaker B: Yeah, I, I kind of like to know a bit more information about the way people are using it. Like most people using it. Like, like a tool. I like do this, do this, do this. Or are they addressing it as though it were, you know, a Persona in itself, as though they're talking to a person? Are they treating it like a person that's collaborating or are they treating like a tool that performs work for them? And I wonder if one of those actually provides better outcomes than the other. [00:16:19] Speaker C: That would be interesting. But I also treat new college grads like tools, so it's sort of. How do you tell? [00:16:28] Speaker B: It's a mute bun. [00:16:33] Speaker A: All right. OpenAI's Stargate UK appears to be a regional deployment or infrastructure expansion focused on the UK market. This development suggests OpenAI is building dedicated cloud infrastructure in the UK, which could provide enable faster API response times for European customers and address GDPR compliance needs for AI. UK specific deployment may include region locked models or features tailored to British, English and UK specific use cases, similar to how cloud providers offer region specific services. This could mean for businesses the ability to keep AI processing and data within the UK borders, addressing regulatory requirements for financial services, healthcare and cloud government sectors that require data localization. I mean, we already have a GPU shortage, so to now make a regionalized, you know, need for AI is going to further strain the ape, you know, the GPU capacity issues. And so I should probably buy some. [00:17:20] Speaker B: Nvidia stock maybe, maybe not. Because I think China's going to come back raring to go with some really good competitive hardware for a really good price. I, I don't think Nvidia is a good long term investment, personally. [00:17:38] Speaker C: Yeah, I mean you just have to ride that wave and make sure to cash out when it's. [00:17:41] Speaker A: Yeah, you have to cash out at the appropriate time. [00:17:43] Speaker C: Yeah, because that's, it's a one horse race for now, but yeah, it's going to the market by event. [00:17:48] Speaker A: AWS is building inference and that, you know, Google is building their own, Azure is building their own. So yeah, I definitely think Nvidia's dominance in the space is not, not long for the world, but in the short term you could probably make some Money. They're saying OpenAI will export offtake of up to 8,000 GPUs in Q1 2026, potential to scale to 31,000 GPUs over time. So yeah, that's a lot of, a. [00:18:10] Speaker B: Lot of GPUs should start trading in GPU futures. [00:18:15] Speaker A: Is that a thing? Can you do that? [00:18:16] Speaker B: It should be. [00:18:20] Speaker C: Safer than Bitcoin. [00:18:24] Speaker A: All right, well, moving on to AWS. This week they're launching M4 and M4 Pro Mac instances built on of course the Apple M4 Mac mini hardware offering up to 20% better build performance with M2 instances with 24 gigs of unified memory for standard infor and 48 gigs for M4 Pro variants. Each instance includes 2 terabytes of local SSD storage for improved caching. Ooh, Amazon had the money, huh, to buy the two terabytes because that's like a thousand dollar upgrade and build performance through the storage as ephemeral and tied to instances Lifecycle rather than the dedicated host. The instances integrate with AWS services like Code Build, CodePipeline and Secrets Manager, which is clearly where you want this because you're trying to build your Mac apps and pricing follows the standard EC2 Mac model with a 24 hour minimum allocation period on dedicated hosts available through on demand and savings plan in US east and US west regions initially beyond iOS Mac OS development, the 16 core neural engine makes these instances suitable for ML inference workloads, expanding their use case beyond traditional Apple platform development. Yeah, I don't. Okay, sure. Not sure that's where I'm going to be looking at my inference models. [00:19:23] Speaker C: Yeah, I mean it's. I read that and I was a little confused. Are they building inference into apps and that's what it's needed for, but it's a little strange. [00:19:29] Speaker A: Yeah, I mean like the, the inference chips that are in the Infor are actually really impressive and they do a lot of really cool stuff but Apple can't leverage them effectively at the moment because they're terrible at AI so far. So. [00:19:43] Speaker C: But if you're writing your own Mac. [00:19:44] Speaker A: App, if you're writing your own AI training or something that makes sense or you know, maybe you need that inference for building your Mac app or building it. I don't know. I don't know what your use case is but if you know, if you are like that's my use case, let us know at the pod@the cloudpod.net I'd love to hear about it. [00:20:02] Speaker B: There's been really good small models released lately. Google have released a small Gemma model. Who else released a small model? I can't remember now, but yeah, I mean these are designed for on device like either on cell phone or small small app server. Having something like that available be really cool for running apps that rely on AI for either the user interface or doing some kind of data processing in the background. But then most PCs are going to have either AMD or an Nvidia GPU anyway, which probably still beats these neural engines. [00:20:37] Speaker A: AWS Toolkit for VS code now integrates with Local Stack, enabling developers to test serverless applications locally without switching between tools or managing complex configurations. The integration allows direct connection to Local Stack endpoints for emulating services like Lambda SQS, EventBridge and DynamoDB. This addresses a key gap in service development workflows where AWS SAM CLI handles unit testing well, but developers needed better solutions for local integration testing of multi service architectures. Previously, LocalStack required standalone management and manual endpoint configuration. The integration provides a tier testing approach with LocalStack for early development without IAM VPC complexity, then transition to cloud based testing with remote debugging when needed. Developers can deploy stacks locally using familiar SAM deploy commands with a Local Stack profile and available in AWS toolkit 3.74 across all commercial AWS regions. The Local Stack free tier covers core services with no additional AWS costs and paid Local Stack tiers offer expanded service coverage for teams needing more broader emulation capabilities. The future continues. AWS has pushed to make VS code the primary serverless environment, building on recent console to ID integration and remote debugging capabilities. Launched in July of this year. [00:21:42] Speaker C: It's interesting because it's one of those things where like I've been able to deal with the complexity so I didn't realize the the size of the gap, but I can see how a developer without infrastructure knowledge, you know, directly with those AWS services might struggle a little bit. So it's kind of interesting. It's like I didn't know I needed this, but maybe I need this. I'll play around and see if if it actually makes my life easier or it just drives me insane because it doesn't do exactly what I want. [00:22:07] Speaker B: I mean, and I've been gone for a few months but I feel I'm going to read a press release like this that actually I've been gone for like seven years because this problem has existed since the beginning of Lambda and then SAM came along a few years later. But the fact that it's taken this long to get this working properly is just. It doesn't reflect well. Thanks. But it doesn't reflect well. [00:22:34] Speaker A: It definitely feels like Amazon has taken a pretty heavy investment into Amazon, into serverless development workflows where they kind of given up the ghost to was it serverless.com or you know, basically they were letting them kind of define the market and it seems like Amazon's now realizing that was a mistake and coming back around on this, I actually didn't realize that local Stack had paid tiers. So I'm just. Neither did I. I'm just not looking. I'm like. And I'm actually kind of intrigued because for $39 per license monthly build annually you can get additional things like you get 55 emulated services versus 30, although they don't detail the list of that. I Wish they did 300 monthly CI credits and local state persistence as well as Stack Insights, Cloud based state persistence, Cloud Sandbox Preview, IAM Policy enforcement, which is one that I actually think is cool. Basic extensions and T level analytics. And then if you go to the $89 per user month, there's a bunch of other things you get as well. 110 emulated services versus 55. So that's fascinating. [00:23:32] Speaker C: Yeah, today I learned. I did not know. [00:23:35] Speaker A: It's not that. I mean I don't know if I would get it personally, but I'm kind of intrigued by it. Yeah. Oh, accuracy. [00:23:41] Speaker C: No, it definitely seems like a business. [00:23:42] Speaker A: Here's the list of services you get in addition to so free and basic for analytics, Amazon MQ gets you into base. Oh, you can do email service, API v2, Elastic Container Registry and ECS is in base. Elasticache, RDS and RDS data API code commit code, building code connections locally IoT. No one cares about IoT cloud control application on scaling EC2 auto scaling. Okay, there's definitely not things that I would necessarily need for my home project, but like the things that for enterprise. Oh, ELB is one you get in the base. [00:24:26] Speaker C: I definitely have run into the RDS one and I just assumed it wasn't supported and there wasn't any option. So that's. [00:24:32] Speaker A: Oh my God. They have Cognito identity pools and Cognito user pools available in the base. Okay, that might be worth it right there if you have to deal with that terrible mess. [00:24:42] Speaker C: I don't know. [00:24:43] Speaker A: I mean I don't want to deal with. I don't. My answer is okta. But you know, anything. But Also the Textract, SageMaker and Bedrock are available in the ultimate. [00:24:53] Speaker C: You still have a code against it. All it's doing is mocking your service. [00:24:58] Speaker A: But at least if I can mock it, then I can debug it locally versus right now. Trying to deal with Cognito is terrible because it's so slow to modify. Oh, that's fascinating. I've learned new things today. Amazon EC2 is now providing detailed performance stats on all NVME local volumes and new EFA metrics for improved observability of the networking components. The 11 detailed performance metrics, for instance, store NVMe volumes are provided at 1/2 granularity including IOPS, throughput Q length latency histograms broken down by IO size, matching the monitoring capabilities previously only available for the EBS volumes themselves. The feature addresses a significant monitoring gaffer workflows using local NVME black box storage based on Nitro based instances, enabling teams troubleshoot performance issues and optimize I O patterns without additional tooling or cost. In the elastic fabric adapter, you're getting five new metrics to help diagnose network performance issues in AI, ML and HPC workloads by tracking retransmitted packets, timeout events, and unresponsive connections. Metrics are stored as counters in the Sys file system and can be integrated with Prometheus and Grafana for monitoring dashboards and alerting addressing the object we got for high performance networking workloads available only on the Nitro V4 and later instances with the EFA installer. [00:26:11] Speaker B: That's cool, and it's great that it's local and it's not through CloudWatch at 50 cents a metric per however long. However long. [00:26:18] Speaker C: Yeah, well, I assume you're shipping it. I mean you can get it locally and ship it to Prometheus or something. I assume you could also get it. [00:26:25] Speaker A: Yeah, I'm assuming you can, but that's, that's for the EFA specifically. If you're doing efa, you're typically talking about HPC workloads to begin with. So yeah, you know, local is probably better for a lot of things, but the NVME I think that is going to CloudWatch, but that makes sense for that because it's baked into the Nitro hardware. Yeah, that's nice. Always good to see new monitoring metrics for observability. AWS is launching the R8GN instance powered by the Graviton 4 processor, delivering 30% better compute performance than Graviton 3 and featuring up to 600 gigabytes of network bandwidth, the highest among network optimized EC2 instances. These memory optimized instances scale up to 48x large with 1,536 gigabytes of RAM and 60 gigabytes per second of EBS bandwidth. The R8GN instances also support those elastic fabric adapters we just talked about on larger sizes, enabling lower latency for tightly coupled HPC clusters and distributed computing workloads. These Are available to you currently in U.S. east and U.S. west, Oregon and Virginia with metals size restricted to North Virginia. Sourcing a phase rollout approach for the new instance family. The combination of the Graviton 4 processors and 6th generation Nitro cards positions R8GN as AWS's premium offering for customers doing both high memory capacity and extreme network performance in a single instance type. [00:27:39] Speaker B: 600 gigabits a second for a single instance. [00:27:43] Speaker A: Wow. A lot of throughput, isn't it? [00:27:46] Speaker B: That is insane. That's what you need for vllm clustering across multiple machines. Yeah, that's fantastic. [00:27:55] Speaker A: You know, loading a lot of data from S3 into training instances. All kinds of use cases where you want that to be super fast. [00:28:03] Speaker B: Yeah. Or even running the S3 service. [00:28:07] Speaker A: Yeah, that too. I mean I think we're finally. I think they've covered the CMNR instances now. So now they're ready to announce Graviton 5 at re invent because everyone now supports the Graviton 4. Finally, AWS CDK is introducing a new CDK refactor command in preview that enables safe infrastructure reorganization by preserving deployed resource states when renaming constructs or moving resources between your stacks. This addresses a long standing pain point where code restructuring could accidentally trigger resource replacement and potential downtime. I mean, God, if I could get this in Terraform. The future leverages AWS cloudformation's refactor capabilities with automated mapping computation to maintain logical ID consistency during architectural changes. And this allows teams to break down monolithic stacks, implement inheritance patterns or upgrade to higher level constructs without complex migration procedures. Real world impact includes enabling continuous infrastructure code of evolution for production environments without service disruption. And teams can now confidently refactor their CDK apps to improve maintainability and adopt best practices and the features available in all AWS regions where CDK is supported with no additional cost Be on standard cloudformation usage development matters for AWS customers. Managing complex infrastructure's code deployments while previously had to choose between maintaining technical debt or risking production stability during refactoring operations. [00:29:24] Speaker C: It's interesting because I want to see. Yeah, because how it works is key. Right? Because in Terraform you can do this. It's just clunky and it's just so painful. [00:29:32] Speaker A: Yeah. [00:29:35] Speaker C: And so, you know, I'm hoping that the. This is a little smoother. I. I don't know. I don't use CDK enough to really know how it structures. [00:29:43] Speaker A: It makes me want to play with CDK more, but maybe not that much. But I do sort of want to play with it. [00:29:50] Speaker C: It's like if you're starting out, it might be a thing, but like, I'm so like, it would be such a lift for my brain to start with cdk. [00:29:59] Speaker A: Well, Amazon's giving you something cool this week. Ryan with a new McP server for CloudTrail that enables AI agents to analyze security events and user activities through natural language queries instead of traditional API calls. The MCP server provides access to 90 day management event histories via look of Events API and up to 10 years of data through the CloudTrail lake using Trino SQL queries. The open source integration available at GitHub.com allows organizations to leverage existing AI assistance for security analysis without building custom API integrations. Service available in all regions supporting CloudTrail lookup events API or CloudTrail Lake with cost based on standard CloudTrail pricing for event lookups and Lake queries. Key use cases might include automated security incident investigation, compliance auditing through conversational interfaces, and simplified access to CloudTrail data for Teams without deep AWS API knowledge. Which would be your SOC team, most likely. Yes. [00:30:49] Speaker C: I mean, this is fantastic. Just because it's so tricky to sort of structure queries in whatever SQL language to get the data you want, and being able to phrase things in natural language has really made security operations just completely simpler. That's not good words, but it's just one of those things where it's like the query languages of all the major tools have become so complex and trying to adapt normalized data structures across all of these different sources of security logging has been nightmarish. And so like, the more we can get MCP for that, where it's like, did this person log in after this? What? What was their next transaction after logging in and doing these, you know, or, or even lumping it together and to detect anomalous behavior and doing that via natural language just makes everything easier from a security perspective. [00:31:48] Speaker B: Yeah, the advantage isn't that it's an MCP service because you could have called any API from MCP service that just can make a web call. I guess the advantage is the fact that the MCP server provides this translation between natural language and the cluster of the language which is used to describe the queries in the first place. [00:32:08] Speaker C: And stuff like this might take over like traditional sim, you know, because it's like if you can, you know, define sort of your chat interface bot and your security interface bot and they can talk to mcps on the back end, like all of a lot of these sims that the big Value is the normalizing of the data set and the sort of DSL that they use to query it. So it's like this is, you know, a splunk, not killer, but, you know, working in that direction where it'll sort of change the way that those tools work. [00:32:39] Speaker B: Yeah, I'm looking forward to. When we get. I mean, we can define agents now, which really just like projects include, you know, you have system instructions and things like that. But I. What I'm really looking forward to is like perpetual agents where you can. You just start kind of like a lambda. You know, you can run on a schedule or do something else. But I want a perpetual agent, which I give it a task and every half an hour or on demand or if it receives a web call or something, it performs that task for me. And so you could. You can think about security agents sitting, watching for specific patterns in logs and then triggering other workflows to trigger further investigation or whatever you want to do. Really. [00:33:19] Speaker C: Yeah. And you're starting to see it integrated in some products like that exact model. Trying to remember if this, what I know is a public announcement or not, whether I could talk about. I can't remember. Yeah, it's definitely something that's. Something that's continuously performing analysis. And it's pretty great because it's. Yeah, it's exactly what you want. [00:33:44] Speaker B: A mild tangent, somewhat related because it's kind of skewers related. I've got a ubiquitous camera out front on the drive and now I've got my decent GPU that can do vision analysis. And I have it taking a snapshot every 10 seconds and comparing the sequence of frames. And it writes a narrative about what happened on my driveway. It'll be like a person will pass with a dog and then a bird flew past and then a red car drove past. Somebody's walking up the driveway. But. But you can literally build a. Build a story based on the pictures that it sees every 10 seconds. But you could apply that to anything. [00:34:23] Speaker C: Oh, no, exactly. That's what I was just thinking. Like applying that to, you know, security data. Right. Like it's. What if it just read to you the actions, you know, and including any kind of anything that was anomalous or weird. Right. Like that's reversing it. Right. And so your security analysis, your analysts won't be searching for these things and sort of stringing these things together. It'll be taking that input and probably processing it through, you know, whatever soar case management they have. [00:34:52] Speaker B: Yeah, it's kind of fun. [00:34:53] Speaker C: Yeah. That's cool. [00:34:58] Speaker B: There are a lot of cloud cost management tools out there, but only Archera provides cloud commitment insurance. It sounds fancy, but it's really simple. Archera gives you the cost savings of a one or three year AWS savings plan with a commitment as short as 30 days. If you don't use all the cloud resources you've committed to, they will literally put the money back in your bank account to cover the difference. Other cost management tools may say they offer commitment insurance, but remember to ask will you actually give me my money back? Our Chair A will click the link in the Show Notes to check them out on the AWS Marketplace. [00:35:37] Speaker A: All right, moving on to Google Cloud, they're launching their Data Transfer Essentials, a no cost service for EU and UK customers to transfer data between Google Cloud and other cloud providers for multi cloud workloads. The service addresses EU Data act requirements for cloud interoperability, while Google chooses not to pass costs to customers despite the act allowing them to do so. Data Transfer Essentials targets organizations running parallel workloads across multiple clouds, enabling them to process data without incurring Google Cloud egress fees. Customers must opt in and configure their multi cloud traffic, which will appear as 0 charge line items on bills while non qualifying traffic continues at standard network service tier rates. This positions Google Cloud ahead of competitors on multi cloud data transfer costs as AWS and Azure still charge significant egress fees for cross cloud transfers and the service built on Google's previously previous moves like wavering exit fees entirely and launching BigQuery Omni for multi cloud data warehousing. Key use cases include distributed analytics workloads, multi region disaster recovery setups and organizations using best of breed services across different clouds. Financial services and healthcare companies with strict data residency requirements could benefit from cost free data movement between the clouds. Service requires manual configuration through Google's guide to designate qualifying multi cloud traffic, adding operational overhead compared to standard networking. [00:36:50] Speaker C: It's funny, I focus in on like kind of like an obscure offering of this where it's their the single line item for the zero billing is the part that I love the most on this so that you can still sort of have that visibility at the common metric of money layer of your service so you can understand what is you know, a cross cloud or outbound egress cost. You know, in the in the realm of things because it's very difficult to sort of make your cost comparisons and forecasting with some of those things. [00:37:17] Speaker A: That's pretty cool. [00:37:18] Speaker B: It's like going to Safeway though and getting the bill at the end and it says you saved $56 on this transaction. Well, I didn't really. Well I guess, I mean you kind. [00:37:28] Speaker A: Of do but if you didn't have a Safeway Club card you would have paid more money. [00:37:32] Speaker B: Yeah, yeah. [00:37:34] Speaker A: You know I don't really understand like I'm looking at the documentation right now because I'm just curious. So create data transfer essential resources. So you know basically GCloud, CLI, yada yada. You know you basically can. There's a ported services list you can get with a location you want to use but then like creating the configuration like you is it literally just you create a configuration name and then add the location and then the services you want to add from the list of. [00:38:04] Speaker C: Known like that's I see it as like an opt in. [00:38:07] Speaker A: It's totally an opt in but like it's interesting. It's not very complicated but it's just a nuisance that I assume Terraform supports us or will support it soon or. [00:38:15] Speaker C: Will soon because there usually is a bit of a delay. But I really, I think it's, it's so that you know, I mean what's. Customers will see it in their bill. [00:38:24] Speaker A: And fixing it allowing like I mean if I'm in, if I'm in GCP Europe and I happen to have this use case even if like why would I just put all my traffic into this and never pay egress? What's the. How do they prevent that? [00:38:38] Speaker B: It's specifically for traffic within your own organization though. [00:38:40] Speaker C: It's. [00:38:40] Speaker B: It's specifically intra organization apps and not. You can't use it for sending data to your customers or getting data from your customers. [00:38:47] Speaker A: How do they know it's a customer? How they know it's my IP on AWS versus someone other customers? [00:38:52] Speaker B: I mean I presumably. Well the docs say that they, they have to approve the ASN and I don't know I imagine they're not doing this lightly or particularly willingly. [00:39:02] Speaker C: You're definitely not willing. [00:39:04] Speaker B: That's why it stays on the bill. [00:39:08] Speaker A: Well I mean like they're the, the recognized ASNs that they're supporting are pretty broad. Alibaba, Amazon, Amazon AAS, DigitalOcean, IBM, Microsoft Corp, Oracle OVH Cloud or I mean like the, but not even to the level like it's a very. [00:39:23] Speaker C: It's probably just a line item in the bills and services. So if you do use this for a whole bunch of customer traffic then you're just breaking the contract rules, I guess. Yeah, I don't know. [00:39:32] Speaker A: But yeah it also enforce. [00:39:34] Speaker C: It also seems like once you set that up, you wouldn't really have a choice if it's right. Like if you're setting up the. [00:39:40] Speaker A: Yeah, how do, how do I designate this traffic from Cloud SQL is for a customer versus for me going to aws? Like I don't, I don't think you could. [00:39:47] Speaker C: Right. If it's by ASN and the advertiser. [00:39:48] Speaker A: I mean it's probably just one of those like we'll make it complicated so it's not something you can just make easy. But then we'll, we'll make it. You know, the reality is they're making EU data people happy and it's, it's. [00:40:01] Speaker C: Yeah, it's for when you call them support going why is my bill so large? And then support can be like, oh, because you needed to set up this essential content like data transfer. [00:40:13] Speaker A: Well, Kubernetes 1.34 is now available. It brings dynamic resource allocation DRA to ga, finally giving production ready support for better gpu, TPU and specialized hardware management, which is a critical feature for your AIML workloads. Apparently it also addresses K YAML, which addresses the infamous NORI bug, so infamous that none of us knew what it was other than Jonathan and YAML's white spacing nightmares. By enforcing structured parsing rules while remaining compatible with existing parsers just by set kubectlk YAML equals true to avoid those frustrating debugging sessions from stray spaces, POD level resource limits are now in beta, simplifying multi container resource management by letting you set a total resource budget for the entire POD instead of juggling individual container limits with POD level settings taking precedence when both are defined. Several stability improvements landed, including order namespace deletion for security, preventing network policy removal for the pod, for example streaming list responses, reduced API server memory pressure in large clusters, and resilient watch cache initiation to prevent thundering herd scenarios. GKE's Rapid Channel delivered this release just five days after the open source release, showcasing Google's commitment to keeping their managed Kubernetes service current with upstream developments. Take that Amazon. [00:41:23] Speaker C: Exactly. That's definitely a dagger there. I'm very fascinated by the K YAML because I have a true hatred of YAML configuration and I'm using GCP a lot more at the day job and so like I'm being forced to use it and I'm mad about it and it's just, it's one of those things like transferring or working on YAML files that Are for, you know, moving on different sources is just still just awful. Like the space. And so like, if this fixes that, I would be pretty excited. [00:41:56] Speaker B: You know, I like to think of it as, as fixing a problem with JSON rather than fixing a problem with YAML. Because what it, what it looks like is JSON. But now you can have comments, inline comments like you could always do with YAML. [00:42:07] Speaker C: That's fascinating. Oh, I might love this. You're right. [00:42:14] Speaker B: So you still need the braces. You need quotes around values, but not keys and lists. You have to put in square brackets like in JSON. So it's like, it's very, very JSON. Like, but you can still. But I like JSON use comments. [00:42:27] Speaker C: That doesn't bother me. I know I'm one of the few people are like, no, I can write JSON as long as it hasn't. As long as I have an ID that can do the linking. [00:42:35] Speaker B: As long as somebody does the indents for me. [00:42:37] Speaker A: Exactly. [00:42:37] Speaker B: Everything. [00:42:38] Speaker A: Yeah, yeah, yeah. Oh, for those of who are not familiar with the Norway bug, Jonathan, do you want to explain like you did to Ryan and I earlier? [00:42:48] Speaker B: Yeah, the Norway bug. The country code for Norway is. Is no. And if you don't put that in quotes in a YAML document, it's. It's passed as a boolean false. Same applies to Ontario. On is passed as a boolean true instead of the string on so it causes confusion. [00:43:09] Speaker A: Indeed. I did not know about this, but I'm surprised this hasn't been a problem like a long time ago. [00:43:14] Speaker C: Why is it probably has. [00:43:16] Speaker B: It probably has nothing against Norway, but how many people are putting Norway country code in the YAML documents? [00:43:23] Speaker A: Apparently not enough to. Until recently this became a big deal. I mean, how many other languages do you think have this problem? There's got to be other places where on or no is a problem in code languages. [00:43:38] Speaker B: I mean most of them use true and false. I think it was a weird exception to YAML. I'm not, I'm not even sure why they use on and true on and yes and no. [00:43:48] Speaker C: I didn't even know you could do that. You know, like I've always used because yeah, I'll support true false. So I just use that. [00:43:53] Speaker B: Yeah, yeah, that's a weird choice. [00:43:57] Speaker A: That does not burn them. So there you go. [00:44:00] Speaker B: Well, I guess you could, you could equally say, well, what if you wanted the string true, but then you have to put it in quotes so it. [00:44:06] Speaker A: What if your name is true, then what do you do that's true. [00:44:09] Speaker C: Change your name and and like remove your parrots from your life. [00:44:15] Speaker A: Yeah, exactly. Google Cloud is introducing a new recipe for disaggregated AI inference using Nvidia Dynamo on AI hypercomputer, which physically separates the pre fill, the prompt processing and the decode the token generation phases of LLM interference across different GPU pools to improve performance and reduce your cost. Solution leverages a 3 ultra instances with the Nvidia H200 GPU orchestrated by GKE with Nvidia Dynamo acting as an inference server that intelligently routes workloads between specialized GPU port tools, one optimized for compute heavy pre fill tasks and the other for memory bound decode operations. This architecture addresses a fundamental inefficiency in traditional GPU serving where both inference phases compete for the same resources, causing bottlenecks when long pre fill operation blocks rapid token generation leading to a poor GPU utilization and a higher cost. The recipe supports popular inference engines including VLM, SGLang and TensorRT LLM, with initial configurations available for single node, 4 GPU pre fill, 4 GPU decode and multi node deployments for models like llama33 dash 70b instruct available to you on GitHub. While AWS and Azure offer various inference optimization techniques, Google's approach to it physically does aggregating inference phases with dedicated GP pools and intelligent writing versus a distinct architectural approach to solving the compute versus memory bandwidth challenge and LLM serving. Since Google did it and now they haven't done it, they probably will copy them. So thanks Google. [00:45:38] Speaker C: I just want to say at this point that I'm glad Jonathan's back from special assignment just because I I don't know what you just said in the slightest. [00:45:48] Speaker B: It's just like any app, any monolith where you need to scale different part, where different parts of monolith get used at different rates or have different resource requirements. Do you scale the entire monolith up and then have wasted CPU or RAM on some of them? Or do you break it up into different components and optimize for each particular task and that's all they're doing? It's a pretty good idea. It's not going to make a bit of difference for people at home, people with their Nvidia graphics cards and things, but it will make a difference for providers who are batching requests running hundreds of inferences at a time. [00:46:21] Speaker C: Thank you. [00:46:23] Speaker B: You're welcome. I'm glad to be back. [00:46:27] Speaker A: I'm glad that I at least understood enough of it that I understood what it was doing. So I Feel like I've learned a lot while Jonathan's been on special assignment. [00:46:35] Speaker C: That's pretty cool. [00:46:35] Speaker A: Yeah, like I understood. I was like, oh, I got all this. I know what VLM is, I know what SGLANG is now. I know what tensorrtlm is. I know all those things because I've had to learn it over the last six months. All right, well, Google's Data Science Agent now Generates code for BigQuery, ML, BigQuery data frames and Apache Spark, enabling users to scale data processing and ML workflows directly on BigQuery infrastructure for distributed Spark clusters by simplifying, including keywords like bqml, big Frames or pyspark and prompts. Now I can tell you on this side, I don't understand any of that. The agent introduces at Mentions for BigQuery Table discovery within the current projects and automatic metadata retrieval allowing users to reference tables directly in prompts without manual navigation through cross project searches still require the traditional plus button interface. The key limitation is the agent currently generates only Spark 4.0 code, which may require organizations on earlier Spark versions to upgrade or avoid using the agent for PI Spark workflows until backward components compatibility is added and the feature targets data scientists and analysts working with large scale data sets that exceed single machine memory limits with practical applications and forecasting customer SE and predictive modeling using serverless infrastructure to minimize operational overhead. [00:47:44] Speaker C: This kind of makes me wonder what the Data Science Agent did before this announcement. [00:47:48] Speaker A: Right. [00:47:49] Speaker C: Like I don't understand the value but I mean I do think that this is cool. Like I, I'm come in to appreciate, you know, the agentic workflows where you're specifying tooling language directly and has an understanding of that and can be much more effective. You don't have to sort of teach it about, you know, how the service or how your table is structured or with instruction. So it is kind of great. I love that. [00:48:16] Speaker B: Literally treating your data in tables as, as an agent in its own right, like with the ad sign, just reminds me of, you know, getting somebody's attention and just invoking the agent which can pull data from that table in the conversation. That's. [00:48:29] Speaker A: Yeah. So it looks like before the DSA was working inside of a colab runtime and so now instead of being basically kept in the walled garden, it's now able to write direct ML query code. That's what's changed here. So before it was able to do this and create, you know, a plan and execute a plan, but inside of a colab runtime which may be an abstraction layer that you did not necessarily want or could slow things down. That's the, that's why that's what it used to do and this is what it does now. Okay, so it's just, it's getting more capabilities to be outside of the walled garden because they trust it more, so they trust the Skynet too. Google Cloud is launching DNS Armor in preview, partnering with Infoblox to provide DNS based threat detection that catches malicious domains 68 days earlier than traditional security tools by analyzing over 70 billion DNSS events daily. The service detects command and control server connections, DNS tunneling for data exfiltration and malware distribution sites using both feed based detection for known threats and machine learning algorithms for emerging attack patterns. DNS Armor operates as a fully managed Service requiring no VMs, integrates with Cloud logging and security command center, and can be enabled at the project level across VPCs with no performance impact on Cloud DNS. This position is used to be competitive against AWS Route 53 Resolver, DNS Firewall, and Azure's DNS Private Resolver, offering similar DNS security capabilities, but with Infoblox's Threat intelligence that adds 4 million new threat indicators monthly. Enterprise customers running workflows in GCP gain an additional security layer that addresses the fact that 93% of malware uses DNS for command and control, making this particularly valuable for financial services, healthcare and other regulated industries. [00:50:07] Speaker C: This is cool because it's, this is one of the harder problems to solve in security is just that, you know, like there's so many services where you have to populate sort of DNS entries and then to route traffic to it, and then it can basically be abandoned over time and bit rot, and so then it can be snatched up by someone else and then abused. This will help you detect that scenario. Like, you know, those, the domains that are very much like your, you know, your, your professional DNS URL. Like it probably will detect that. And it's, you know, it's definitely something where you need lots of public sources to detect and something to do it for for you, because no one has time to fix DNS. Cool. [00:50:51] Speaker B: Yeah. It's kind of funny that nobody's looked at this before. I mean, it's always we've looked at page content, we looked at URL names and done pat matching on those things. The fact that DNS has been used for, I don't know, 20 years to exfiltrate data is kind of funny. No one's thought to actually analyze DNS queries and use that for threat modeling. [00:51:13] Speaker A: I mean, I don't know that we had the compute infrastructure to do it. I mean the amount of data you're talking about in DNS query, you know, DNS logging at that point would have been billions of transactions per day. [00:51:23] Speaker C: I tried to do, I tried to do something like that and it, it didn't work out because the dataset was too large. [00:51:28] Speaker B: Yeah, I guess the difference is in the past DNS was very distributed and you know, people didn't pay for custom DNS providers. But you know, Infoblox now is in a position where, where companies are choosing them deliberately for features like this. And so they're going to generate data which will be useful for everyone. [00:51:45] Speaker C: Yep. [00:51:45] Speaker A: Yeah, I think it's a combination of that. And then also, you know, the ability to have things like Trino and Pyspark and all these other tools that can handle massive amounts of data quickly and efficiently versus a SQL query which would have failed. Yes, yes it will. Google's Announcing Agent payments protocol or AP2, an open protocol for secure AI agent led payments that works with A2A and model context protocol addressing critical gaps in authorization, authenticity and accountability when AI agents make purchases on behalf of the user. The protocol uses cryptographically signed mandates as tamper proof digital contracts. They create verifiable audit trails for both real time purchases. Human present and delegated tasks. Human not present solving the trust problem when agents transact autonomously. AV2 supports multiple payment types including credit cards, stablecoins and cryptocurrencies with a 2A x402 extension already providing production ready crypto payment capabilities. In collaboration with Coinbase and and Ethereum Foundation. Over 60 major organizations are participating including American Express, MasterCard, PayPal, Salesforce and ServiceNow. Positioning this as an industry wide initiative rather than a Google only solution, protocol enables new commerce models like automated price monitoring and purchasing personalized merchant offers through agent to agent communication and coordinated multi vendor transactions within budget constraints. What could we go wrong? [00:53:04] Speaker C: Yeah, I read this at first I'm like who's willing to trust this? And then it, you know, goes into the, all the, the cryptocurrency and that and I'm like oh no now it makes more say. [00:53:15] Speaker B: I mean maybe the path to the kind of like the micropayments thing that people have been trying to get off the ground for years. Yeah, you, you run a blog or something and something like this could actually get you the half cent per view that you know would cover the cost of the server or something. [00:53:33] Speaker C: I wouldn't trust it. I wouldn't trust an AR work. [00:53:36] Speaker B: I used Claude, I used Claude yesterday to order groceries from Safeway. It's pretty cool. I told, I gave it a recipe. I'm like, I want to make beef stroganoff. I don't have anything in the pantry right now. Find me, find me suitable ingredients. On Safeway. The only, the only problem I had is I had to pick manually, like delivery or pick up first. The first time I did it, it didn't add anything to the cart. But second time, after I told it I wanted it wanted to pick it up, it went back again, added everything to the cart. It's slow, very slow because it puts these artificial delays in like wait three seconds, wait four seconds between clicks to avoid the bot detection. And then Safeway's website still pops up and said, are you a bot? Click here. Are you human even I had to manually click the thing a couple of times. But I can see why companies like that wouldn't want people to use good tools to find the cheapest groceries around. [00:54:28] Speaker C: I don't think that's why they do it because the Safeway site has got so much profit. Performance limitations probably are too. [00:54:35] Speaker A: Yeah, yeah. [00:54:38] Speaker B: But that was kind of cool. [00:54:42] Speaker A: The Alloy DB on C4A Axiom processors delivers up to 45% better price performance than in Series VMs for transactional workloads that achieve 3 million transactions per minute. The new One VCPU option cutting entry costs by 50% for development environments. Google's custom ARM based Axiom processors outperform Amazon's Graviton 4 offerings by 2x and throughput and 3x improvised performance for postgres workloads. Hey, give me actual numbers. The tradition of a 1 VCPU 8 gig memory configuration stresses developer needs for cost effective sandbox environments, though it lacks uptime SLAs. Even in HA configurations, production workloads can scale up to 72 VCPUs with a new 48 VCPU intermediate option. The C4 instances are priced identically to intuitive VMs while delivering superior performance, making migration a straightforward cost optimization opportunity for existing Alloy DB customers without price penalties. This has limited regional availability in select Google Cloud regions, which may impact your adoption timing. The GA status signals production readiness for customers already testing a preview who cited both performance gains and cost reductions and I assume we'll see it roll out as they scale up production. I mean, I love the one VCPU 8 gig box. That's great because that's what my need for my dev boxes. Yeah, exactly. I mean I could even got away with 1 VPCPU4 gig of memory, but I'll take 8. Sure. [00:56:00] Speaker C: Yeah. I really like LA WB but I don't have a huge workload for it right. So it's. [00:56:05] Speaker A: I mean, same reason I love Aurora but I don't really have a workload for it so. Although I do. We are running the CloudPod on Aurora instance now, but we got off EFS finally to fix our performance problems. That was a good choice. Like the website works again. [00:56:22] Speaker C: Nice. I didn't realize that was performance. That makes sense. [00:56:25] Speaker A: I mean it was efs, man. Like it is just a problem for containers and we're trying to run anything at scale for web and we were seeing a bunch of load from a bunch of bot traffic that was causing some problems. But we're in much better shape now. I fixed it. Nice. All right, Google Cloud Trace now supports Open telemetry protocol about time for trace Data ingestion via telemetry.googleapis.com Annealing vendor Agnostic diagnostic telemetry pipelines that eliminate the need for Google specific exporters and preserve the OTEL data model during transmission. The new OLTLP endpoint significantly increases storage limits. Keys expand from 128 to 512 bytes, values from 256 bytes to 64 kilobits and span names from 128 to 1024 bytes and attributes for span from 32 to 1024. Addressing previous limitations for high volume trace data users. I mean, I assume that was all important to somebody. I don't know how Cloudtrace's internal storage now natively uses the OpenTelemetry data model and leverages OTEL semantic conventions like service name and span status in the Trace Explorer ui, improving the user experience for filtering and analyzing traces. Google positions this as the first step of a broader strategy to support OTLP across all telemetry types, traces, metrics and logs, with future plans for server side processing, flexible routing and unified telemetry management across environments, which I'm just glad to see they're finally getting into OTL and that they are actually going to make it across all their products because it's been a gap for a while. [00:57:51] Speaker C: Definitely. And it's, you know, it's definitely kept me from using something like Google Trace because I'm not going to instrument something specifically. [00:57:58] Speaker A: Well, I'd rather. I'd much rather use an OTEL open agent than I can point the data to whatever service I want to be Google's or be it someone else's. [00:58:05] Speaker C: I just didn't want to learn a new thing. I'm totally. [00:58:07] Speaker A: There's that too. [00:58:08] Speaker C: Yeah. [00:58:09] Speaker A: Like. [00:58:09] Speaker B: Well the other thing is by standardizing structure like this, it makes it a lot easier to do data analysis on the content. [00:58:16] Speaker C: Definitely. [00:58:19] Speaker B: Or train an ML model. Everything comes down to AI training at this point. Amazon thing and they're collecting metrics from NVMe. I'm like, yeah, that's nice. But why? Because they're probably going to try and predictive failure rates and things like this. They just want data. Data could even do capacity management if. [00:58:40] Speaker C: You'Re like Amazon positioning for MV. Yeah, I digress. [00:58:47] Speaker A: Google is investing 5 billion over two years in UK infrastructure, including a new data center in Waltham Cross Hertfordshire, which hopefully Jonathan tells where that's at to support growing demands for AI services like Google Cloud, search and Maps. The investment encompasses capital expenditure, R and D and engineering resources with projections to support 8,250 jobs annually in the UK while strengthening the country's AI economy. Google partnered with Shell to manage its UK Carbon Free energy portfolio and deploy battery technology that stores surplus clean energy and feeds it back to the grid during peak demand. The expansion positions Google to compete more effectively with AWS and Azure in the UK market by providing local infrastructure for AI workloads and reducing latency for UK customers. The data center will support Google's DeepMind AI research and science and healthcare offering UK enterprises and researchers improved access to Google AI capabilities and cloud services. And the DeepMind AI research is the most obvious reason why they did this. [00:59:39] Speaker C: Yeah, it's really tricky to do GDPR compliant things when you're trying to ship a whole lot of data and have AI do analysis. Right. [00:59:48] Speaker A: Like it's. [00:59:49] Speaker C: So this, this allows the best of, well, not best of both worlds but like it allows you to do that basically. So it's pretty awesome. [00:59:58] Speaker B: Well, DeepMinds demise Hassabis lives in the UK, so you know, this may be on his wish list for a local. [01:00:05] Speaker A: Yeah, that's why when I, when I first read this and I was like, oh, that's interesting, you know, I guess, you know, everyone's trying to get data centers built everywhere. Then they mentioned DeepMind, I was like, oh yeah, it makes perfect sense. Yeah. For those of you who are trying to learn more about data science, Google's releasing a new ebook called A Practical Guide to Data Science Google Cloud that demonstrates how to use BigQuery, Vertex AI and serverless Spark together for modern data science workflows. The guide emphasizes unified workflows through Collab Enterprise Notebooks that blend SQL, Python and Spark code in one place with AI assisted features that generate multi step plans and code from high level goals. Google's approach allows data scientists to manage structured and unstructured data in one foundation using familiar SQL syntax to process documents or analyze images directly through BigQuery itself. The ebook includes real world use cases like retail demand forecasting and agricultural risk assessment, with each example linking to executable notebooks for immediate hands on practice, which is nice. I always appreciate these very simple guides to the universe basically. [01:01:06] Speaker C: So I don't learn from reading like I've learned that I, you know, like I've, I've read a lot about the individual technologies and, and these, and so these little sort of practical guides that'll give me like especially the agricultural forecasting like it's, it's effectively setting up your environment and then you know, you're answering real, real questions or answering questions about your data and walkthroughs like that are really, for me that makes it really efficient to learn a thing because then I understand and I can take that like I don't have an agricultural workload personally, but I have a bunch of other workloads where I'd rather do that and I can, I can build off of that. And then for Google it's obviously a huge sales marketing thing for their services, which makes sense. [01:01:54] Speaker A: I mean, I'm kind of disappointed that you weren't using this to help optimize your yield of corn for your distillery. [01:02:02] Speaker C: Well, I don't have the distillery yet. Once I have the distillery, that's exactly what I'm going to do and no one will hear from me again because you'll be drunk. Yeah, yeah, I won't be doing my forecast. I'll be drinking the profits. [01:02:17] Speaker B: Oh, he'll set his house on fire. 102. [01:02:19] Speaker A: Yeah. [01:02:22] Speaker C: Not mutually exclusive. [01:02:26] Speaker A: Google Research has developed Vault Gemma, the first large language model implementing differential privacy techniques that prevent the model from memorizing and potentially exposing sensitive train data. By introducing calibrated noise during training, the research establishes new scaling laws for private LLMs, demonstrating that increased privacy more noise requires either higher compute budgets measured in flops or larger data budgets measured in tokens to maintain model performance. This addresses a critical challenge as tech companies increasingly rely on potentially sensitive user data for training. With the noise to batch ratio serving as the key parameter for balancing privacy protection against model accuracy. For cloud providers and enterprises, technology enables deployment of LMS that can train on proprietary or regulated data without risk of exposing information through model outputs, opening new use cases in healthcare finance to other privacy sensitive domains. And the approach provides a mathematical framework for developers to calculate the optimal trade offs between privacy guarantees, computational costs and model performance when building privacy preserving AI systems. And I'm hoping Jonathan understands that because I don't know if I did. [01:03:24] Speaker C: So I know this one as well because it's something that's come up a lot in my day job. Like it's, it's, you know, you want to train a model based off of sensitive data and then you want to offer the output of that model through a chatbot or whatever it is like publicly. And it's terrifying as a security professional because it's, you don't know what data is going to be spit out and you can't predict it and, and you know, and it's very hard to sort of analyze within the model what's in there. And so this is definitely something where like I've had data teams, you know, involve me in the design and I'm like, I don't think you should train on that data because it's, because of the risk. And you know, and it's from a security professional point of view, they're like, well, what is the risk? And I'm like, it's a risk. And I, you know, and it's, it turns it into this really challenging sort of how do we move forward from here problem. And so if solutions like this where you can sort of have mathematical guarantees or at least something you can point at, that would go a long way in making those, those workloads a reality, which was fantastic. Now Jonathan, you can tell me all the things I got now. [01:04:31] Speaker B: No, I, I'm really unclear on exactly how this works. I, I think fundamentally, I think it's a way of extracting the relationships between things without recording like verbatim the actual data, which was used for training. So you add noise to preserve enough of the context to build those, to build the graph, build the vectors so that they relate to each other in the right way, but without actually ever being able to get that back out again in a way which can reconstruct the original data. [01:05:06] Speaker C: Which is crazy to me. Like that just sounds like, you know, introducing noise seems like it would break the whole thing, but pretty awesome. [01:05:13] Speaker B: I mean it, it makes it very computationally expensive. [01:05:17] Speaker C: Yes, that is the downside. Right? Like, and it's. [01:05:22] Speaker A: I, I mean, is this, is this also a way that large LLMs could hide the fact they're using, you know, books and things, you know, to actually train the model? Is it would make it harder to determine where the training came from. [01:05:38] Speaker C: Yeah, it absolutely would do that. Right, because it's, it's less likely to spit out verbatim something that it's. That it's trained on like in. For. For public content. [01:05:49] Speaker A: No doubt interesting. [01:05:52] Speaker C: But it's at least it's got that huge, you know, cost penalty which would. [01:05:56] Speaker A: Probably make that not. I mean, it depends on how much those lawsuits get settled for. [01:06:01] Speaker C: Yeah, well, that's true too. [01:06:05] Speaker A: All right, moving on to our friends in Redmond with Azure Azure Cosmos DB for MongoDB VCore now supports customer managed keys or CMK in addition to default service managed encryption, providing enterprises with full control over their encryption keys through Azure key Vault integration. The dual layer encryption approach aligns Azure with similar capabilities From AWS and MongoDB Atlas encryption capabilities addressing compliance requirements for regulated industries like healthcare and finance that mandate customer controlled encryption. The future enables key rotation, revocation and audit logging through Azure Key Vault through customers should note potential performance impacts and digital key vault costs beyond standard Cosmos DB pricing organizations can implement bring your own key scenarios for full multicloud deployments or maintain encryption key consistency across hybrid environments. Particularly useful for migrations from on Prem Mongol db. The V core deployment model already differentiates from Cosmos DB RU based pricing by offering predictable compute based cost. And CMK supports strengths and appeals for traditional MongoDB workflows requiring familiar operational patterns. Woo. I wowed you guys with cmk. [01:07:16] Speaker C: I mean I do like these models. I do think it's should be used sparingly, right? Because it's. I don't think there's a whole lot of advantage of, of bringing your own key unless you, you are required to by, you know, some sort of regulatory compliance because it's effectively, you're saying, you know, like you can revoke the key and then Azure can't get to edit your data and it feels like an unwarranted layer of protection. Like I don't know, I have, I have a relationship with this business. I have, you know, use cases. But maybe I'm not cynical enough for a security engineer. [01:07:50] Speaker B: Yeah, it'd be interesting to kind of look at statistically of all the people who've lost keys, how many of those keys were ones they managed themselves versus ones that the cloud providers managed. And I'm going to probably hazard a guess that it was the ones you managed yourselves. So the requirements to use customer managed keys for things is probably starting to do people a disservice rather than be actual improvement in posture. [01:08:17] Speaker A: It Feels like fake security in some ways. Because, okay, I'm going to create this key that I manage and I'm going to give it to my SaaS vendor and I'm going to ask them, hopefully through UI, I'm going to apply that key to the data. And now the data is encrypted, but for them to be able to do their function, they have to use the key that I gave them. So if I revoke the key from my hcm, there is no mechanism built into the encryption that I'm aware of that will have that key come back and validate. The key is valid in my HC hsm. Correct. [01:08:48] Speaker C: So that's one layer I've always been a little bit confused on because I've assumed that it's there, right. That you, you have the ability to revoke a key within the key store. [01:08:58] Speaker A: But I mean, that means my HSM is available to the Internet some way. [01:09:03] Speaker C: Well, I think you'd, I think you would store the key somewhere, but then you would. Like, when you're revoking the key, I think you'd do that directly in the, in the service. But I've never done it. That's another reason why I'm like, I don't know if CMK is really the answer that everyone's looking for, because I've never heard of a use case where someone's revoked a key so that the, the data was unaccessible by a third party. [01:09:28] Speaker B: I mean, if they've already got a copy of the key, which they would have to continue to use and they. [01:09:32] Speaker C: Have to have a key because they're encrypting it. Right. [01:09:35] Speaker B: I mean, I can see how it may be beneficial with like, public key encryption where you can give them half the key and you can say, okay, you can encrypt data and put it here and I can read it later. But you can't do it, you can't decrypt it again. That's kind of a different issue. [01:09:47] Speaker A: I mean, in a CMAC deployment, right, you could, I mean, me as the customer who gave you the key, I can go in and then say the key is invalid or destroy the key and remove it from your system. And then now that data is unaccessible from you as the vendor, so that's the, that's potentially the control you're getting. [01:10:04] Speaker C: But you have the key, so you. [01:10:05] Speaker A: Could decrypt that data in theory, but also you're relying on the SaaS vendor to actually properly delete the key for that control to work. Like there's a lot of guarantees you're assuming are working in that model. [01:10:16] Speaker C: Yeah. [01:10:16] Speaker B: You're relying on the fact that the key is being checked with the HSM for every operation or every application. [01:10:21] Speaker A: Well, no, it can't be. There's no way you're doing that. The operational overhead and lency of that would be just really terrible. It's really, it's really about. I gave you the key so I can go in and I can revoke the key in some way or disable the key, you know, disable part of it. But I cannot see it being in the real time runtime. It just putting my HSM on the Internet, which I don't want to do for lots of reasons. [01:10:50] Speaker B: Well, isn't bring your own key more about the secure generation of the key in the first place? Potentially you generate the key and you put it in your hsm. It's not extractable. And so you use the key that's embedded in the HSM to either sign other keys that you make or do something else, but you can't get the key out again. Whereas if there is some level of trust which you're gonna have to put in the vendor that if they generate a key for you that nobody else ever had it, it's a chain of custody thing. Nobody else ever had access to that. To that key. [01:11:21] Speaker C: Well, and regulatory compliance has specific. In some cases, like has specific, you must encrypt using these. Right. And so if you know it's a. There's security and there's compliance and compliance is the security you can prove. And so like a lot of these things, you can take an attestation by your cloud provider and hopefully they've covered that. But in this case, you know, you could, if you're generating your own, you can, you can say it's generated this way with this policy and it's. And you can provide that whole, like you said, chain of custody. [01:11:54] Speaker A: Yeah, it just like there's definitely some, some interesting use cases on it. And again, like, you know, customer deletes the key and then deletes them in your system. And now that data's. You have data loss. But there's something. I can't, I can't save you in that scenario too. [01:12:07] Speaker C: Yeah, but they have to delete it twice instead of just accidentally once. [01:12:10] Speaker A: Right. [01:12:11] Speaker C: Because there's rolling, rolling back a terraform stack that deletes the key from the key store. And if it's your own key, you can hopefully get it back from your hsm. [01:12:20] Speaker A: Yep. I mean I guess, I guess that's the risk that you're. If you, if you're the. If you own the key and then the vendor does something bad with the key, then you are protected because the data is still accessible to you. I guess in a weird, roundabout way. All right, let's move on for this because it's hurting my brain. [01:12:37] Speaker C: Yeah, I'm sure our listeners love this conversation. [01:12:39] Speaker A: Azure Logic apps now support model context protocol servers in public preview, allowing developers to transform logic app connectors into reusable MCP tools for building AI agents with two deployment options. Registering connectors through Azure API center or enabling existing logic apps as remote MCP servers. The API center integration provides automated workflow creation and easy auth configuration in minutes while registering MCP servers in a centralized enterprise catalog for discovery and management across organizations. This allows you to leverage Azure against AWS's agent building capabilities by leveraging logic apps extensive connector ecosystem with over 1000 connectors as Prevail tools for AI agents, reducing development overhead compared to building custom integrations from scratch. Prior customers include enterprises building AI agents that need to integrate with multiple systems. The MCV approach allows modular compositional capabilities like data access, messaging, workflow orchestration without extensive custom coding. Implementation requires logic app standard tier which are consumption based pricing starting at some division of a penny per action and Microsoft Entra app registration for authentication and HTTP request response triggers with proper schema descriptions for tool discovery. [01:13:46] Speaker C: Yeah, for me, the real value in this is that central catalog. Just because the minute MCP was out there, people were standing up their own MCP servers and then building their own agents and then it was duplicative. Right. And so you've got every team basically running their own server doing the exact same thing. And now you get the efficiency of centralizing that through a catalog. Also, you know, like you don't have to redo all the work that's involved with that. There's efficiency there as well. So I do like that. I'm a little confused on the, you know that because I don't use Azure day to day. Like we had to look up what a logic app is and that like, you know, the. I'm sure it's a capability that it's nice to have within those tools. I think it exists in the other tools or maybe, I mean maybe in loosely other forms. But this is more of an enterprise approach. [01:14:38] Speaker A: Yep. And we, we did confirm logic apps basically their workflow service, which was. We don't have Matt here to explain to us why it's. Why this is Great, so we're gonna move on Azure container storage v2 delivers 7x higher IOPS and 4x lower latency for Kubernetes workloads using local NVMe drives. With PostgreSQL showing 60% better transaction throughput. The service is now completely free with no per gigabyte fees, making it cost competitive against aws, EBS and Google Persistent Disk, which charges for management overhead. Microsoft Open source the Entire platform at GitHub Azure Local CSI driver, allowing deployment on any Kubernetes cluster beyond aks. This positions Azure as more open than competitors. The new architecture reduces CPU consumption to less than 12.5% of Node resources, down from 50 up to 50% previously by delivering better performance. This efficiency gain directly translates to cost savings since customers can run more loads on the same infrastructure. Integration with CATO or Kubernetes AI toolchain operator enables 5x faster AI model loading for inference workloads on GPU enabled VMs with local NVMe. This targets the growing market of organizations running LLMs and AI workloads on Kubernetes. And single node deployment supports remove the previous three node minimum requirement, making it practical for edge computing development environments and cost conscious deployments. And this flexibility addresses the key limitation compared to traditional SAN based storage solutions. [01:15:58] Speaker B: I mean, I wouldn't say it addresses limitation compared to SAN based storage solutions if you can deploy single node. I mean, SAN has redundancy, SAN has lots of other things. A single node doesn't get you that. Mm. [01:16:12] Speaker A: It just depends on what you, you know, what you need. [01:16:15] Speaker C: I, I mean, I just think previously customers in the Azure platform had to go outside of Azure to make this happen. Right. So there wasn't a solution, or at least an easy solution where they could, you know, quickly mount this and, and run. Run whatever performance they need on the data. So I, you know, like, I think that that's the real advantage. [01:16:35] Speaker B: Yeah, what, what's crazy is that the storage layer could have consumed up to 50% of the CPU on a node previously. [01:16:41] Speaker A: Yeah, that's crazy. [01:16:42] Speaker B: That's quite, it's quite the overhead. [01:16:44] Speaker C: Yeah, just a bit. [01:16:46] Speaker A: Microsoft Fabric is introducing graph and Maps capabilities to help organizations structure data for AI agents. Moving beyond simple data unification to create contextualized relationship aware data foundations that AI systems can reason over effectively. The new Graph and fabric feature uses LinkedIn's graph design principles to visualize and query relationships across enterprise data like customers, partners and supply chains. While Maps and Fabrics adds geospatial analytics for location based decision making. OneLake Fabrics. Unified data Lake now supports mirroring from Oracle and Google BigQuery plus new shortcuts to Azure Blob storage, allowing organizations to access all their data regardless of location while maintaining governance through new security controls. Microsoft is integrating fabric with Azure's AI Foundry to create a complete data to AI pipeline where fabric provides a structured data foundation. And AI Foundry enables developers to build and scale AI apps using familiar tools like GitHub and Visual Studio. The platform targets enterprises ready to move from AI experimentation to production deployments. With over 50,000 fabric certifications already achieved by users preparing for these new AI data ready capabilities. [01:17:46] Speaker C: Sounds expensive. I mean anything on fabric I think is expensive. I wonder, I mean, I wonder if. What if this is a. I don't think it would be a cost increase anywhere. It's just a sort of functionality increase. [01:18:01] Speaker A: Yeah, I mean the fabric stuff is interesting because it's basically just a ton of stuff like Power BI and the Data Lake and stuff shoved into one one unified platform, which is nice and it makes it easier to do data processes. So I don't expect it to be a major cost increase for customers who are already using fabric, but if you haven't used it, then yeah, there's definitely potentially some, some risk to you. [01:18:25] Speaker C: Well, I mean I think it's one of those things where you might choose fabric over something like, you know, databricks or Snowflake, you know, and the more functionality because they're at. They're both those tools are adding this functionality and sort of easy buttons effectively to query your data and so you know, you have to kind of keep up with the Joneses a bit. [01:18:46] Speaker B: Yeah, it's like the single pane of glass for access to your data. [01:18:50] Speaker A: Our final story for the night Oracle stock has making has made the single biggest day gain in 26 years on huge cloud revenue projections. It jumped 36% after announcing projected cloud interest revenue of 144 billion by fiscal 2030, with remaining performance obligations hitting 455 billion. A 359% year over year increase driven by four multibillion dollar contracts signed this quarter. Apparently Oracle's bretched 18 billion in OCI revenue for the current fiscal year still trails AWS's 112 billion and Azure signed billion. But their aggressive growth directory suggests they're positioning to become a legitimate third hyperscaler option, particularly for enterprises already invested in Oracle databases. The upcoming Oracle AI database service launching October will allow customers to run LLMs from OpenAI, Anthropic and others directly against Oracle's Database data, a differentiator from Amazon and Azure who lack native database integration at that level. Oracle's partnership strategy with aws, Microsoft and Google to provide the data center infrastructure creates an unusual dynamic where competitor growth actually benefits Oracle. While their 4.5 gigabyte data center expansion with OpenAI shows their securing critical AI capacity, the market's enthusiasm appears driven more by Oracle's confidence in predicting five year revenue forecasts, which is unusual on the cloud infrastructure, than actual Q1 results which miss both earnings and revenue expectations. So yeah, I read all this stuff about Oracle stock and all this future potential and I was just like, this is just BS guys. Basically Oracle has committed a ton of capex to buy a lot of GPUs that they are then going to sell to customers, which is exactly what aws, Azure and GCP do. They're not doing anything special here. There's nothing unique. And you're making a huge bet that Oracle customers would like to actually access their data AI directly, which I'm not sure that's the case in all scenarios as well as you're expecting that this is all based on all of OpenAI's investments panning out and working out for Oracle in their favor, which I'm not entirely convinced of either. So I'm not sure this is the, this is the moment to invest heavily into Oracle stock based on this earnings announcement. Especially considering they couldn't even hit their own revenue and earnings target for this quarter, let alone have reliance on a five year forecast. That is just ridiculous in a market that is completely undetermined and changing every day. Yeah, great. [01:21:12] Speaker C: And this, the fact that the street bought it up is the part that, you know, confuses me the most. But of, you know, that's just everything to do with the stock market, I suppose. I'm not, I don't understand. [01:21:21] Speaker B: It's like a massive pump and dump scheme. [01:21:23] Speaker C: It really is. Like, I just don't get how this is how our economy runs, but it totally does. [01:21:31] Speaker B: Yeah, I mean let's say, let's say GPUs last maybe the last three years, maybe last five years, I don't know. Running 100% utilization inference workloads on a GPU, I'd imagine they will wear out at some fairly accelerated rate. But the fact that there's so much technology being built now to replace Nvidia Cuda for inference workloads, I wouldn't have put my eggs in this basket and spent billions on GPUs which may well be obsolete in two years. [01:22:03] Speaker C: Well and I think that part of it is new gpu. You know, deployment and development is a huge number of these factors too. But a single competitor in the market completely throws these numbers into the bin and so it's just like, wow. [01:22:16] Speaker B: Yeah. [01:22:17] Speaker A: Okay, so yeah, if you're invested in Oracle stock, maybe a good time to sell because I'm not sure about long term viability of this. [01:22:25] Speaker C: Well, don't take financial advice from us. [01:22:26] Speaker A: Yeah, don't do that either. I mean, don't listen to me, but. [01:22:29] Speaker C: Easy there. [01:22:30] Speaker A: Whoa. Yeah, sorry. That's what I would do if I were you. But you know, you do you. And I'm again, not a. Not a. Not a finance person or understanding of this. All right, gentlemen, well, it's been a fantastic week here in the Cloud. Thanks for joining us once again. [01:22:46] Speaker C: Bye, everybody. [01:22:47] Speaker B: See you later. And that's all for this week in Cloud. We'd like to thank our sponsor, Archera. Be sure to click the link in our show notes to learn more about their services. While you're at it, head over to our [email protected] where you can subscribe to our newsletter, join our Slack community, send us your feedback and ask any questions you might have. Thanks for listening and we'll catch you on the next episode.

Other Episodes

Episode

February 29, 2020 00:44:08
Episode Cover

This Episode is EPYC!

We follow continuing stories with the JEDI contract, GigaOM and our new Lightning Round format on this week’s episode of The Cloud Pod. A...

Listen

Episode 85

September 24, 2020 01:04:19
Episode Cover

Episode 85 – The Cloud Pod Plays Buzzword Bingo on Machine Learning

On The Cloud Pod this week, your hosts introduce the idea of plaques to commemorate a feature suggestion becoming a product. A big thanks...

Listen

Episode 122

June 24, 2021 01:09:24
Episode Cover

122: Welcome to Crash Consistency Week

On The Cloud Pod this week, Matthew Kohn joins the team as a substitute for Jonathan and Peter, who have gone AWOL. Also, Google...

Listen