[00:00:07] Speaker A: Welcome to the cloud pod where the forecast is always cloudy. We talk weekly about all things aws, GCP and Azure.
[00:00:14] Speaker B: We are your hosts, Justin, Jonathan, Ryan and Matthew.
[00:00:18] Speaker C: Episode 349 recorded for March 31, 2026. Gmail finally lets you ditch xx dragonslayer2004xx as your username. Good evening, Jonathan, how are you doing?
[00:00:30] Speaker A: I'm good, Justin, how are you?
[00:00:32] Speaker C: Good. You know, it's been a busy week already, as they seem to always be these days in 2026, every week is a disaster, you know, and it's such a disaster that Ryan and Matt were unable to join us tonight. But I did go find a guest for us this week and so I want to have a chance to introduce David Garraway. Dave, how are you doing?
[00:00:50] Speaker B: Doing well, thank you. How are you?
[00:00:52] Speaker C: Great. You want to tell the listeners a little bit about yourself real quick, but we have you on for a very particular reason this week.
[00:00:58] Speaker B: Yeah. So, Dave Garraway, about 25 plus years in technology. I was actually a mechanic before I was in technology.
Kind of interesting story there, but we won't get into it. Been building data centers, got into the cloud, everything else worked for the customer side, worked with the system integrator side. And I've been really focused on building AI data centers and the large megawatt data centers that everyone's talking about today.
Awesome, thanks for having me.
[00:01:24] Speaker C: Yeah, we're happy to have you here and you have a lot of good valuable input since you were feet on the ground at gtc, which we're excited to talk about here in a couple minutes. But let's run through a couple follow up items that you might have heard from a prior episode. First up, Anthropic has won their preliminary injunction blocking the Department of War's blacklisting ruling. The designation was a First Amendment retaliation rather than a legitimate national security action. The court found officials lacked authority to blacklist Anthropic without considering less restrictive alternatives providing evidence of an urgent security. Security risk, nilly. Designation was triggered by Anthropic's hostile manner through the press. The practical business impact was already substantial before the ruling, with three trade deals canceled and other potential partners delaying negotiations, representing potentially billions in lost contracts over five years. I don't know. I've been working with Anthropic on a commercial deal on something and getting their sales team to give you any time to do anything is impossible. So I'm not sure it hurt them so much, but, you know, definitely was not a great situation for them to declare a U.S. entity supply chain risk like that.
So I'm not surprised this, this ended up this way in court. I do expect this is not the end of it though, and we'll continue to see it go right up to the Supreme Court.
[00:02:35] Speaker A: Yeah, very likely. I mean, I guess they can still not ch, you know, choose to not do business with Anthropic, but avoiding this label is, is, is probably a good thing. I'm guessing Anthropic are just super busy with all the people coming to them for, for deals right now because it seems to me that Anthropia and getting all the business customers and, and OpenAI getting the personal customers.
[00:02:56] Speaker C: Yeah, OpenAI definitely seems to be feeling like they're losing out on too much enterprise business. So they've been pivoting and focusing on some of their things. We're not going to talk about it too much, but they did kill Soro, which was their AI generated video social media network to compete with TikTok or YouTube this last week and they've been heavily focused on trying to get more focus on enterprise deals. I think basically on the Anthropic threat.
[00:03:21] Speaker A: Yeah, I think well talk about it for just another 30 seconds.
I think Anthropic were very smart in going after businesses from the beginning instead of personal customers because if you think about the type of data they've got access to right now for training, it's all stuff that it's either public domain or it's books or it's journals and things which are easy to get but what they don't have is proprietary information from businesses and I would imagine that they're going to want to try to train domain specific models by partnering with companies in certain sectors and they may give them a head start on some very specific AI use cases that OpenAI will miss out on.
Yeah, so good for them.
[00:04:09] Speaker C: Time will tell.
[00:04:10] Speaker A: I'm team Claude for sure.
[00:04:12] Speaker B: Yeah, definitely.
[00:04:14] Speaker C: Well, this one might be interesting to Dave a little bit too because I don't know if you've followed some of the drama, but basically two weeks ago Delve someone wrote a substack post basically claiming that Dell faked evidence use their own accredited auditors that were shell companies out of India and basically fabricated a whole bunch of things and sold and stole software from an open source community that was also a Y Combinator company called Sim Space I believe.
And so they initially had a five point article we talked about last week where they tried to basically say no, we didn't. And so that of course wasn't good enough and so now they've come back and have responded to the allegations, basically denying claims of fake evidence. Clarifying that independent AICPA accredited auditors, Not Delve, issue SOC2 reports on ISO 27001 certifications and the company published a formal rebuttal and is now rolling out operational changes to address customer concerns. To support customers facing questions from their own clients and procurement teams, Delve is now offering complimentary re audits through independent auditors, complimentary gray box penetration tests and formal engagement letters from auditors, all at no cost to the customer. On the transparency side, Delve is moving auditor communications directly into customer slack channels or shared email threads so customers have full visibility into the audit process rather than relying on Delve as the intermediary. Fiverr is also adding clear disclosures to templates and forms to explicitly identify them as guidance tools aligned to industry standards. Addressing a core point of confusion raised in the controversy and cloud practitioners, this situation highlights the importance of understanding the distinction between compliance automation platforms and your auditor in general.
The big issue I have here is if you didn't do anything wrong, why did it justify you making all these changes?
[00:05:52] Speaker B: Yeah,
[00:05:55] Speaker A: that's fair, but extra scrutiny on the things that they aren't doing quite right, that could be better is probably why they'll argue they made changes even if they weren't related to making up evidence. But you know, it's just, it just sucks. Now I've got to make up my own evidence.
[00:06:10] Speaker C: Yeah, again, I think the reality is that, and we talked about this last week, is that SOC 2 audits are very heavily empathetized. That's how these companies make them and they work them. They do need to be edited and reviewed and approved and the right things need to be done. But they can't always start as a template. A template's not the problem. It's what appears to be the automation and then the rubber stamping by these, these auditors. And then of course the substack came out with another update, basically part two of their thing. And they've actually released a couple additional articles we'll talk about next week probably. But basically it was a whistleblower from inside of Dell provided apparently internal screenshots and recordings from a bunch of internal meetings, including conversation suggestion, sorry, conversations suggesting the company's auditing partner, I Corp, may not conduct thorough evidence reviews before issuing their SoC2s.
So it's just getting worse. I don't know that SOC or that Delve actually survives this. And if they don't get called out by the aicpa. At some point, I suspect that everyone's gonna start questioning SOC2 in a big way of how real these things are, but definitely not great. And we continue to see new evidence coming out that Delve is completely smoke and mirrors, at least if you believe the substack.
[00:07:20] Speaker A: Yeah, why, why wouldn't you? I guess they, the fallout's gonna hit the customers that use Delve for their certifications as well.
[00:07:26] Speaker C: Yeah, well, I mean, every time I, there's a breach now I'm going to go check out the website and say, hey, were you, were you sock two by Delve? Because that's my first, my first question, because we already saw one that was definitely a, you know, a Delve sock to place. So it's going to get worse before it gets better. But definitely it's bad for enterprise compliance, enterprise purchasing, who now have to basically negate a bunch of SOC2 evidence that's potentially fraudulent or now do their own auditing assessment, which is going to increase costs for everybody. So it's not great. That's why I said I think AICPA has to step in at some point and issue a statement or some type of, you know, basically this isn't. Okay, all right, well, let's, let's get into Jones and hit up gtc. So for those of you who are unaware, Nvidia has their annual user conference, the GTC conference, it's in San Francisco this last year. I don't know if it moves around, but basically Jensen Huang got on stage, he talked about a bunch of cool new Nvidia things, including the new Vera Rubin platform.
They introduced their own version of openclaw called Nemo Claw that's supposed to be more secure, which, shocker, it's not, so be careful. And then there are six new domain specific open model families including Nemotron, BioNemo, Cosmos and Earth2, all available by process. And then new DSX digital twin platform using Omniverse to simulate thermal, electrical and network conditions before data center is physically built, with Nvidia estimating roughly a factor of two in recoverable efficiency. And then of course some of our other companies that we follow in space, DigitalOcean for one of those, basically announced that Richmond Data center will be built with the Nvidia HGX B300 servers. But I mean that's just a small surface level of what was announced at gtc. But Dave, you're on the ground, so tell us what it was like there.
Is this a conference that cloud producers should check out in the future? You think, and what did you take away from this thing?
[00:09:16] Speaker B: So first takeaway is that definitely they need to do this somewhere else because they did it in San Jose and it was just a nightmare. They need to have a bigger, they need to be able to pack more people. I mean, when you look at this thing seven, eight years ago compared to where it is now, it was like a couple hundred people that were gamers, you know, back then, whereas now it was, it's, it was the largest conference I've been to in a long time.
What I will say though is, is that being, you know, in technology, that is a great place to go, kind of put your finger on the pulse of where things are. The expo floor was, was interesting and the amount of overflow that they had, like you walk the whole floor thinking you're done, all of a sudden it goes into other rooms where they put overflow in. And there was a lot of innovation. And it was crazy to see the amount of innovation from the hardware space just in the past two years and where they're going with the amount of, the amount of gpu, you know, condensed GPU in Iraq. It was very impressive to see, seeing all the different companies out there that are developing on top of it. But, you know, the more important thing is, is that how do you, how do you have a platform that can actually run a lot of these things? You got like the open cloth stuff that they have and they've got these little devices that, you know, basically small, little DGXs that you could run these things on little personal computers.
But the, the big stuff that was impressive for me was looking at, you know, they've got the NVL 72s, obviously their flagship rack of servers, dense 72 GPUs in Iraq, coming out with the new VH with, you know, the NVL 144s and just the amount of, you know, you're looking 120,000 watts of power in Iraq to power these things up with NV link, it was just pretty impressive to see kind of how far they've come along in the past couple years to run these things.
That was quite impressive to see.
So the, you know, and one of the things, the, the dents and the liquid cooling. So we went down, I went down this journey of, okay, we gotta build a, say 20 megawatt data center. What's that going to require to power these things up? And with that, you know, you can run anywhere like 80 to 100 racks of MVL 72s in a 20 megawatt generator facility. But how do you cool that, right? So anyways, I went down that whole journey of back into data centers at scale, right. And how do you do all this thing? And it's pretty crazy to see all the CDU type manufacturers out there for cooling your power distribution, all that it was, it was pretty impressive to see.
So all in all I would definitely say that it was worth it. Just from the high level of the infrastructure level, they give me away.
[00:11:55] Speaker A: Any freebies?
[00:11:58] Speaker B: I wish.
[00:11:59] Speaker C: I'll take a Nvidia gpu. Yeah. Then please, I'll take that. Well, is that a giveaway? Is that a giveaway item I can get?
[00:12:06] Speaker B: Yeah. The one thing I don't, I don't like about that is that there's such a big. So I'm building out right now some kind of test, you know, a test GPU right now with for RTX 6000. But the RTX is, are really more kind of geared towards, you know, training, inferencing, but also a lot of video. Right. You want to get into any of the other, you know, the bigger beat, you know, the Blackwell chips, anything like that. You go from like 8, $9,000 a card to like 60, $70,000 a card that has to run into a server platform, everything else. So it would be nice for the people that want to try to learn on this stuff to have something kind of in the middle which they just don't have.
You're paying for the big boy stuff and I don't know about you, but my fiance doesn't like having servers running in the garage anymore.
Sounds like a jet starting running all the time.
[00:13:00] Speaker C: Not now. They generate so much heat and definitely don't like to have them in my garage.
[00:13:04] Speaker B: Yeah, luckily I put a 16 kilowatt solar so I can power those things up now and it doesn't cost me a fortune. But yeah, the amount of heat that comes out of that garage is, is a killer.
But yeah, it was, it was impressive to see where they've come with all that. And you know, if you ever want to you know, get down deeper into like how they're doing the NV link and the speeds and feeds are doing on that. And what really impresses me is the ecosystem, right.
If you're going to do large scale training or inference, the inferencing kind of separate but large scale scale training, like you know, the only two real storage vendors out there that we've really worked with a lot is just going to be like a weka and a VAs because they scale out horizontally on the front end as well as on the back end. So you can get real high speeds and feeds to go ahead and feed these really hungry bottles.
Now you're looking 72 GPUs in a rack. That thing's going to ch, it's going to crunch through a lot of data. How do you feed that? Right. And the storage, storage arrays and then how do you connect to them? Infiniband. Rocky. Rocky seems to be the way to go. That's what I've used a lot in the past.
[00:14:10] Speaker A: I feel so detached from all that stuff now. I've been working in the cloud for 10 or more years. It's like, well, oh, hardware actually still exists. It's not just a request for terraform.
[00:14:22] Speaker B: Yeah, it's, it's, you know, I've been disconnected from a lot because my whole background is in data centers building, you know, building data centers. Large, large ones building, running. But I've been disconnected for about three years. Just work, you know, living in the cloud and, and that's why going to GTC was really eye opener, to see kind of where things are.
And then for me I kind of look at what's the landscape like, right. You got the Neo clouds out there. You know what I mainly see is I see the Neo clouds and people are really focusing their business and giving all their attention on the people with all the money. So you're Fortune 50, Fortune 100 companies or giving all that to the actual OpenAI's and the Anthropics.
But there's this most people, man, there's like 330,000 businesses that are like 10 million to a billion dollars in the country.
Where do people go for that? So it was interesting to try to see where in that ecosystem that they're going. And I see a big gap right there, which I think eventually you will start being filled with some other solutions.
It's not like normal company is going to want to go spend $100 million a year for a, you know, air gap, sovereign AI infrastructure.
[00:15:38] Speaker A: Yeah. And the crazy thing right now, the economics is that I'm not even sure if real money is, is even changing hands. It's like, well, we'll give you 10, you know, $10 billion worth of this stuff if, if you give us $10 billion worth of your, your future profits on whatever you're going to build with that gear.
[00:15:51] Speaker B: Yeah, yeah, I'm going to invest in you. As long as you take that money and put it back and invest it in me. Right.
[00:15:58] Speaker C: A lot of Left hand, right hand operations going on in that stuff. But, but I mean it's also problematic too because I mean we just saw RAM prices are starting to fall because, you know, OpenAI is starting to sound like they're not going to be able to fully commit to their 40% of micron and 40% of Crucial's memory Fab capacity.
And so, you know, we've already seen some things like that, so it's questionable.
You know, I know at this GTC conference they were saying a trillion dollars in demand is what Nvidia is saying is out there in the market. And how real is that in reality? Or is there just a lot of hype and a lot of bubble happening?
[00:16:33] Speaker B: I think what, you know, to that point, you know, OpenAI they also pulled out of a data center build they're going to build in Austin. Huge facility out there. They just kind of abandoned.
So I think what, you know, that kind of touches on exactly what I was, what I was discussing is, is that I think if you look at the total demand, it's there.
But how do you take it? These people can't afford a hundred million dollars worth of this. So it's more like they want these big customers like that. There's very few customers and the ones that are needing that are already doing that. And then they're also finding that hey, as we, we went in there needing 5 racks of MVL 72s but now we start using it, we only need 2, right? We get the scale back.
So that's kind of that, that, that section in there in the middle which is the biggest one, which I think if you look at, you know, it's not five people that are going to be a couple trillion dollars worth of know, worth of spend, but it's going to be 300,000 companies if you're, if you're able to service them, can fill that kind of that demand.
That's right.
[00:17:35] Speaker A: Do you think they have any really incentive, Nvidia I'm speaking about, they have any real incentive to service the, the greater demand of all those businesses? Or do you think all they care about is it is the money they make on every unit they sell?
[00:17:50] Speaker B: I think they're kind of still living in the newness of this and catering to these large companies.
Whereas like I say that's going to die off. And you know, as we're starting to see and as Justin, as you said, is that, you know, you're starting to see the demand start to go down again. People are trained, they're realizing that they don't need this much and you're going to start seeing all a real big surplus in available gpu. I talked to one guy, he's, he's got a friend that just built a $300 million data center because their team told him they need training. And he said it's idle. They're like, we, we don't know what to do with it. Now we're going to be a neobot, like you know, so it was one, one other thing that, that I found was kind of cool is, is that I always look for how do you give people small or give them their own models with these large scale, you know, infrastructure. We met with the CTO of VCluster and I don't know if you're familiar with them, but they basically give you a way to kind of virtualize Kubernetes environments within your Kubernetes environments so you don't have to manage all the security permissions and also the resources and working with them. And actually they have a pretty good solution that we're working with them on where you can actually go ahead and do bare metal install of like a vcluster on these AVL 72s and then basically scale out and carve out different instances based on resources and give those out to people. So now you can have a resource that you give out to a team within, within Kubernetes. You can run your slurm, whatever else in there and then you've got enough GPU to actually run a big model in there and start doing your training.
So it gives people the way to now, you know, instead of one big wasted unit. Right. Think it's like the old VMware days where one big server that's got a job and 90% idle, you'd be able to start using some of these resources.
That's some of the stuff that we're starting, we're starting to look at and look into.
[00:19:48] Speaker C: I mean it's been a little bit since I've been to a data center. Not that I don't have opportunity to go to them. I just choose not to typically. But you know, the kind of, the kind of power that's now involved in these data centers and you know, what's, what's really happening on the two big areas that I know was a big problem was power and cooling.
Uh, so you know, typically we're now, you know, in my big last big data center era was all about three phase power is a big limitating factor. It sounds like most data centers have retrofitted there. But cooling, of course, was the next big thing after three phase co. You know, what are these guys doing to cool these big massive GPUs?
[00:20:20] Speaker B: Yeah, so they, so on the cooling side you got liquid cooling and the liquid cooling you're going to get, basically you've got these CDUs or think of them like they're almost like a rack that. It's a big pump that's just going to be doing a closed loop or liquid system.
And the liquid is, it's like a, it's almost like an ethanol glycol, but it's a glycol derivative in water that's going to be the, the, the coolant. And they actually run that. So from this unit they'll plumb that into the rack. Rack's got all the plumbing that goes alongside and it goes into each server. And then inside the server, if you look at some of these blades on like these MDL 72s, they actually have liquid going all the way to the actual chips. The, you know, each blade is going to have two GPUs and one CPU. So literally running the liquid all the way to there.
But now how do you dissipate that heat? So because when you're looking at 120, you know, like 120 kilowatts, 120,000 watts, that's you just basic math and turn that into BTUs and heat, right? To dissipate all that. In I, we did this exercise for 20 megawatts of traditional chilled water cooling that we had in data center. It's like, you know, five, 10 years ago, that chilled water system would burn 8,300,000 gallons a month in water. Just in water for 20 megawatts.
[00:21:40] Speaker C: Right?
[00:21:40] Speaker B: And you're talking about data centers that are doing like hundred megawatts. So obviously that's not sustainable. So what they're doing in the, and the ones we're looking at are they got dry coolers. Think of them as radiators that just stick on the roof. And as long as you have ambient air and enough fans to run over them, just like a car, you can cool the temperatures.
But that poses a challenge in itself that you're looking at. So for 20 megawatts, I'm looking, you're looking at about 12,000 square feet of roof space, which, okay, 12,000 square feet. But when it's full with all the liquid, £200,000.
So now you got a roof with £200,000 sitting on the top of it, right? Just in the, just in the Cooling part of it.
So that's typically what people are doing. And there's, there's things that are, they've got some other stuff where it's almost like a swab cooler that they're doing to try to cool this. So they're using water and swap coolers. But the most efficient ones that we're seeing are the dry coolers on the, on the roof.
The interesting thing that is, you know, we found on this was the coolant, the water going into the cpu. You don't, you don't want to put ice cold water on a cpu, right. So you want the water to hit, you want the CPU to sit in GPU to run about 113 degrees. So at that point you're not cooling the water so much that you need to have so much cooling on it. You just need to get it back down to about 113 degrees and then run it through there. So you keep everything at the same temp.
[00:23:09] Speaker A: That's interesting. I always thought it was interesting to learn about. You know, there's a data center in Finland, I think called Finland or Iceland. Finland, where they, they, they literally use the excess heat from the, from the data center to, to heat hot water built homes.
Yeah, it seems so wasteful to, you know, to have a 20 megawatt data center plus the extra power that you need to, to cool it and circulate, circulate the water and the cost of water when, when you could actually use that for something useful.
[00:23:35] Speaker B: Well, you could do, you could do heat exchangers. Like that's a, that's a good idea. Like if you were actually to take, you know, the problem is you don't want these so close to a residential that the heat would actually make it to, to somewhere like that. Right.
But if there were, if there were other needs that you can use that hot water for, like you say it's just, it's just exchanging heat at that point.
[00:23:54] Speaker C: There's a bit of a pushback on data centers across the country, you know, but the, what I always say is just bad neighborhood behavior.
Like, you know, there was an article I think I saw about in Virginia, some couple who bought a farmhouse and then they built a data center next to it. And like the light pollution is so bad and then the constant hum of the fans and all of that and it's like, yeah, those are things that can be solved. So, so it's interesting there's quite a bit of backlash against data centers right now in general. So to build all these data centers for AI there's also a problem of where can you build them and how do you get through the regulations and environmental concerns and then how do you just be a good neighbor? And producing a ton of warm water into the, into the drainage system is probably not a great method either. So hopefully they, you know, some of these problems get fixed too.
[00:24:39] Speaker B: Well, I was, I was thinking about if you were running that 8 million, you know, water, gallons of water a month to dissipate, you know, 20 megawatts. I figure you'd have like a storm, a rain cloud above your, above your data center. If you're dissipating that much water, you're building your own microclimate systems now above you. But one of the other things, you know, talking about power and, you know, that's obviously the challenge. I mean, here in Southern California, we'll be lucky to get, you know, a kilowatt. So what we're finding a lot of people are doing is, is that natural gas or fracking? So putting them in areas where you actually. They were doing fracking and there's still gas seeping out of the, out of the earth. Government makes you burn all that gas. They're talking to someone that represents Exxon and they're like, well, tell you what, we'll cap all that gas, we'll throw generators on there and we'll give you power. I'm like, how much power? He's like 50 megawatts. I'm like, wait a sec, what?
So apparently there's enough gas being burned that's just seeping out of the ground after that, this. And I'm like, how, how sustainable is this? She's like, we can guarantee it for like five or 10 years.
So they're getting creative now and trying to find different ways to do it. So a lot of the, I'll say the outskirts, the areas like, even like southeast Alaska and places up there, you know, they're just tapping into those old or those natural gas type of resources that would usually just get wasted. They're just tapping into them.
That's one way that, that they're kind of overcoming this.
[00:26:07] Speaker C: Well, very interesting, very rapidly changing part of our world and technology. And we're curious to see how things continue to evolve. And the never ending question of are we in a bubble or not?
We'll see. But definitely a lot of exciting things happening. We're seeing more on the consumption side where you use inference for things like coding and different use cases. So the infrastructure side is just as interesting and exciting. You know, sort of forgotten by Some cloud people at this point. So yeah, definitely curious to have you back and talk about this again in the future.
[00:26:42] Speaker B: Yeah, one, one last point on the, the the Good neighbor is talk about a bubble. They are talking about a bubble the people I've been talking to. But one of the, the some one of the neoclass I was talking about, they're like when we get to a point where we don't have the need, we're going to start powering up the neighborhoods for free. So we're just going to start giving out power for free. So hopefully you get the good enabler that will extend out there that definitely
[00:27:05] Speaker C: makes some people happy.
Well, based on Dave's gotes and the gray and mine and Jonathan's, we all probably were around when Gmail launched and like at least for me anyways, I was you know, a teenager or slightly past teenager and 20 something or other and you know, those are the days when we you know, did really dumb usernames like Dragon Slayer 573 and so you know, basically in honor of it turning 22 years old, Google has finally taken the occasion to allow you for us based users to change their Gmail username without creating an entirely new account. I mean I'm sure I have legacy accounts scattered across the Google legacy that I you know made and then I like I need professional, I need a, you know, Justin type email address somewhere and so you know, I'm sure I would love this capability but now apparently you can change your username once every 12 months per account.
They've not explained how this will work as you know, for spam mitigation measures or to prevent abuses. But this will be allowed to you and rolling out gradually in the US so you'll be able to get rid of your dragon slayer 4, 5, 6 and now become just boring Jonathan.
[00:28:11] Speaker A: Well, I mean good luck finding a name that's, that's not 20 characters long at this point.
[00:28:15] Speaker C: Yeah, exactly. There's that side of it too.
So all the people remember when hey was coming down there was like I'll get a hey email account right away, get to the next thing and then it didn't really happen. But we all remember doing that from Hotmail to Google to you know, Prodigy. Like there was all kinds of different ones. Aol, early AOL were popular back in the day.
[00:28:33] Speaker B: Earthlink.
[00:28:35] Speaker A: Oh yeah, earthlink.
[00:28:36] Speaker C: That's one I haven't heard in a long time.
[00:28:40] Speaker A: That's A so, so Cal1, isn't it? Earthlink.
[00:28:42] Speaker B: Yeah, yeah, yeah, they're right here but
[00:28:45] Speaker C: where I'm at, they still exist. I just went to their website. I didn't know crazy DSL and fiber though. So yeah, not quite the same as what I remember back in the day.
Well, thank God for that. The other big news this week has been supply chain attacks. Bad, bad week for supply chain attacks. We talked a little bit last week about the first one, but now Team PCP announced on March 19 that they compromised Trivi. Trivly is a widely used open source vulnerability scanner from Aqua Security, injecting credential stealing malware into 75 GitHub Action Tags, Docker images and CI CD pipelines, turning the security tool itself into the attack convector. The malware collected SSH keys, cloud credentials, kubernetes secrets and environmental files from infected systems. The attackers then using those stolen credentials to pivot into Lightlm, which is the other one that got attacked. A Python framework for AI model API management pushing two malicious versions of to PyPy that executed automatically on Python process startup. Basically there's also led to Axios JavaScript library also getting compromised this week, which is a library on NPM pushing malicious versions that include a Remote Access trojan targeting Windows, macOS and Linux users. Axios receives over 100 million weekly downloads, making the potential exposure substantial. The attack window was approximately three hours before detecting being detected and stopped. The security firm Ikedo advises anyone who downloaded Axios during that period to treat their system as compromised. So supply chain attacks started made famous by SolarWinds on log 4J, now getting worse. Probably powered a bit by AI I imagine makes some of this a little bit easier to do as well as we are in a war with a country that has a lot of cybersecurity resources. And so these are things that they may have known about or been sitting on for a while that are just getting exploited in the wild now in wartime as well as by AI. So definitely supply chain is going to be the hot topic of the security space for. You know, it was already a hot topic. It's going to get even hotter from here on out.
[00:30:39] Speaker A: I just can't believe how much trust, blind trust, dumb trust you want to call it, that is involved and an awful lot of open source projects. I mean the entirety of PyPi. I've got a module on PyPi I could, I could commit some bad code to my repo in 15 minutes. If someone installs my package, it's going to run. You know, I'm not aware of a great deal of security checks that happen automatically on the Back end there. But that entire ecosystem is built on trust.
It's not good at all.
Yeah, I think it's got to go away.
[00:31:12] Speaker C: We all have the shot across the bow. You know, back in the shift left days, when the guy who created the shift left NPM package pulled it and all these websites broke and you're like, why is everyone using a package to shift their alignment left? This made no sense. And it was like. And even back then when that happened, I was like, man, this could turn into a really bad, like, you know, things that you could inject into these things. And it's just node in particular has just gotten worse and worse and worse at this where, you know, I, I always get nervous when people are like, hey, it's a big node JS app and use a bunch of NPM dependencies. It's like, yeah, there's a ton of ways I can get attacked.
[00:31:45] Speaker B: So you think all the vibe coding.
Yeah, the vibe coding contributes to all this?
[00:31:51] Speaker C: Yes, 100%.
So, yeah. So your security team is probably yelling at you if you have PYPY or Trivi or Lightlm or any of these other tools in your ecosystem. And time to get your own copies of those upstream libraries, I think is probably your first step, making a more secure future for yourself.
[00:32:09] Speaker A: Yeah, I mean, I think, I think people are going to need the tools to do the scanning locally.
At that point, you're not going to be able to trust the vendor. You know, PIP or UV or something will have to have some kind of scanning built in.
So it's sort of the. The progress of AI in speeding up attacks potentially is. Is also going to drive the adoption of AI locally to help avert them.
[00:32:33] Speaker C: I think you're going to have to. Both sides are going to arm up the same way because that's the only way you can fight this.
[00:32:37] Speaker B: So, yeah, AI to fight AI,
[00:32:42] Speaker C: only way to do it. And unfortunately, the robots had to beat the robots. That's how it works.
Well, AI is how ML makes money. Or this week maybe not make money.
The Anthropic accidentally shipped cloud code npm version 2-1-88, with exposed source map files, reeling nearly 2,000 TypeScript files and over 512,000 lines of code for the CLI tool. Anthropic confirmed it was a packaging error caused by human error, not a security breach, and stated no customer data or credentials were exposed. The leak code has already been archived, posted to public GitHub repo and forked tens of thousands of times, including a rewrite in Python And a very active rust rewrite happening as we speak. This gives competitors developers a detailed look at how Anthropic built its tooling. And then this will now live on as probably OpenClaud or Claude Co clawd cloud code out there for you to take advantage of it, which might be nice too because you'll be able to use it with your own models. If you've ever experimented with like Olama and using your cloud code, you can see there's a tremendous amount of power and availability and now you can see how these things are done. I suspect OpenAI and others will also incorporate some of the features. And even OpenAI rolled out a plugin for cloud code before this happened, where they were using Cortex inside of cloud code to basically check anthropics work for you if you wanted to. So there's already some dirty things happening and now they have the source code so they can do all kinds of cool new things. So look forward to seeing how this evolves here shortly over the next couple weeks. But nothing they can do. Once that code is out, it is out in the world.
[00:34:12] Speaker A: Yeah. The fastest growing GitHub repo in terms of stars ever was the clone of, of, of that repo. A hundred thousand stars in 24 hours, which is quite impressive. I think it may have been taken down by now. I think there's a DMCA takedown.
[00:34:26] Speaker C: I mean, it's gonna live forever on the Internet.
[00:34:28] Speaker A: Oh, for sure. Yeah. But I mean, I guess the question is did, did you really need the, the unobfuscated source code anyway?
[00:34:36] Speaker C: No.
[00:34:37] Speaker A: You've got AI tools. You can, you can, you can literally point Claude at it and say, hey, how does this work? I know because I did it a year ago.
[00:34:48] Speaker C: Yeah, the repo is still up actually.
[00:34:50] Speaker A: Oh, it is. Okay.
[00:34:51] Speaker C: Yeah.
Again, I, I don't know. It's open source code. What they put into the repo and then you put, you know, there's no, no takebacks in the Internet, I guess. I don't know. Takebacks. These are not going to happen. That'd be a good title.
All right. Well, unfortunately, I mean, it sounds like they're taking it in stride. The, the public side of Anthropic has seemed to be. Okay. Maybe people internally are not so happy.
Definitely.
You have to know what you're doing with the AI. You can't just trust it blindly. And so, yeah, including map files. To have it happen to them of all people, is kind of a little bit of a nice chef's kiss to this whole story. I Was like, yeah, it was going to happen to somebody eventually. It just happened to happen to the company who made it possible. Yeah.
[00:35:37] Speaker A: The irony is, of course, that they've already admitted that Claude is writing an enormous amount of their code for them. And I know there was. I can't remember man's name who, who started the project internally, but if, if they're already using AI to generate the code, you know, the law stands that it's not copyrightable anyway. It wasn't generated by a person, so they, they don't have leg to stand on.
[00:35:59] Speaker C: Yeah, the copyright implications are still a bit untested, but potentially, yes, you're right. Not. Not copyrightable, not trademarkable. So we'll see.
All right, moving on to aws. You are. Amazon is still on a bend to apparently make the console modern in 2026 as they're introducing more user experience customizations. Earlier they allowed you the ability to basically color it, so you can now color code your console. So, you know, if you're in production, like I made my production red and my dev blue, so I knew, like, oh, if I'm in the red account, then I'm not in the right place. It's not very well implemented though, I'll tell you, because I was like, I was excited about this feature, then I turned it on, I was like, oh, that's it, that's the blue. Like, it's not very noticeable.
[00:36:42] Speaker B: Antiglomatic.
[00:36:43] Speaker C: Yeah. But nice thing now that these new enhancements are now giving you, you know, the ability to now turn off regions, which is huge advantage to you, and then also ability to now push these out via cloud formation, which means you can also do it through Terraform. So you now set these colors in a more programmatic way and they're available to you. And I really just want to talk about it because I finally turned it on and I was not impressed.
So. But I appreciate the effort, but there's still a lot of work to do in the console to make it better and more distinguishable between different services.
[00:37:13] Speaker A: I used Chrome profiles because pre the Amazon console actually letting you log into multiple accounts, I kind of got in the habit of doing that and I never switched back.
[00:37:23] Speaker C: Yeah, there are some cool rules they put into this now, though. You can not only can you turn the regions on and off via Cloudformation or Terraform, but you also the rules. So if your accounts names, if you have a standard accounting name structure, like all of our Prod accounts have Dash Prod and all of our dev accounts have Dash Dev, you Can now set the rules for the colors so that you don't have to specify it per account. So there was definitely some quality of life enhancements in this announcement too, but definitely needs some work.
[00:37:48] Speaker A: Yeah nice
[00:37:51] Speaker C: AWS Lambda is now going to support up to 32 gigs of memory and 16 VCPUs, tripling the previous limit of 10 gigabytes and roughly 6 VCPUs. Which opens the door for workloads like media transcoding, large scale data processing and scientific simulations run servicely. And if I were to be making a reinvent prediction I would say GPUs soon enough. A notable addition here is a configurable memory to VCPU ratio of 2 to 1, 4 to 1 and 8 to 1, giving developers actual control over resource balance rather than the fixed proportional scaling that standard Lambda always always used. And Lambda managed instances run functions on managed Eastern with built in routing, load balancing and auto scaling so customers get specialized compute configurations with including latest generation processors and high bandwidth networking. Pricing is not fully out for the Lambda managed instances, but we'll keep an eye on that as it continues to roll out soon.
[00:38:39] Speaker A: Lambda's already pretty cheap to begin with though. I wonder quite how much they they could charge for managing the control plane and you still paying for the compute.
Not a lot I would think. Maybe maybe they charge per host or per a small fixed fee per invocation or something. It's going to be interesting.
[00:38:55] Speaker C: I mean how do you segment the GPU enough? That's what I was trying to think about. Like you can do. I mean there's lots of container use cases where you, you know, you can share GPUs across containers but then like you think about the segmentation they do in Firecracker and some of the other things to really do. Segmentation to the level into the GPU would be interesting. Or is it going to potentially have a different additional security vulnerability angles that I haven't thought of yet?
[00:39:20] Speaker A: I think Dave mentioned something about that earlier. There's a VLLM and some other technology which lets you partition hardware basically into. Into you know, eight or 16 virtual GPUs that you can control the size of.
[00:39:32] Speaker B: Yeah, Nvidia's got the MiG.
[00:39:34] Speaker C: I don't think Firecracker though supports that yet. So that's the, the key thing is when does that add up in Firecracker?
[00:39:39] Speaker B: Yeah, cause that's the way that they're doing it right now. Kubernetes and Nvidia's mig and go ahead and slice that thing up and problem is you give away up, you give, you give up one of the slices just to the MIG itself. So it makes it kind of inefficient at, but at least it gives you that segmentation that you need.
[00:39:56] Speaker C: I mean it's just like being early days of VMware and hypervisor, right. Everyone, you know you were taking that 20% hit in CPU.
[00:40:02] Speaker B: Yep.
[00:40:03] Speaker C: And then you know, over time people will figure out how to optimize that and make it into smaller and smaller compute units. And then, you know, now we have hard dedicated ASIC chips that do it. So I assume the same thing will happen with GPUs over time as well.
AWS is launching two generally available frontier agents, a security agent for autonomous penetration testing and AWS DevOps agent for incident resolution and S3 tasks. These differ from typical AI assistants in that they operate independently for hours or days without constant human direction to complete complex multi step workflows. AWS Security agent ingests source code, architecture diagrams and documentation to identify tack chains that traditional scanners miss compressing. Penetration testing time Is by over 90% according to early customers. It shifts pen testing from a periodic cost constrained activity to an on demand capability available 24,7 across an entire application application portfolio. AWS DevOps agent integrates with a lot a broad set of existing tools including CloudWatch, Datadog, Diana Trace, Splunk, GitHub and Azure DevOps, making it usable across multi cloud and on premise environments. Preview customers report to up to 75% lower MTTR and 94% root cause accuracy with WGU saying they cut one incident resolution from 2 hours to 28 minutes using the DevOps agent. The DevOps agent can work alongside tools like HERO and cloud code to not only identify root causes but generate validated fixes that feed back into CI CD pipelines, moving the capabilities beyond investigation into actual remediations.
[00:41:25] Speaker A: Let me just scratch DevOps off my list of potential jobs.
[00:41:32] Speaker B: DevSecOps DevOps scratch them off?
[00:41:35] Speaker A: Yeah.
[00:41:36] Speaker C: Ironically in cloud code you can. You know I turned on starting to collect data for through Rafana on my usage of cloud code and I was looking at my tool usage and the amount of bash that my cloud agent uses a lot of.
So I'm not so sure that DevOps is quite as dead as you guys all say it is. I'm like that's a lot of bash scripting, which is my specialty.
[00:41:57] Speaker B: A lot of kshell some bash scripting.
[00:42:01] Speaker C: Yeah, it's like there's not a lot of tools that are not bash in this list. There are a few but it's kind of funny.
[00:42:07] Speaker B: Does Clyde have like an MQQT that you can pull those stats out of and put them in Grafana?
[00:42:11] Speaker C: Yeah, it talks otel. So literally you can, you can literally configure it in your profile to basically send it to the collector. And then as long as the container is up and running on my box, I get all the, all the metrics and all the magic of what's going on there. So that's pretty nice.
[00:42:26] Speaker B: I know what I'm going to play with tonight.
[00:42:28] Speaker C: Yeah, literally I told Claude like, hey, I'd like to use your OTEL connector to connect to Grafana on a local container and set it all up for me. And then I literally, once I had data, I said, okay, now you see some of my data, let's create some cool dashboards in. It creates different dashboards and things like that for me. So it's, it's all kinds of cool things.
[00:42:45] Speaker B: Something else for me to do too in the morning.
[00:42:48] Speaker C: Yep. But it's kind of cool because you see also how inefficient you are at using AI. Yeah, like this, the spikes of like, oh, can talk, you know, token usage and then down and then nothing and then more token usage and like. Okay, so I, so I see why autonomous agents are really gonna be the key to get really good value out of these things because otherwise your inefficiency of the human, you know, delay is, is a price low. And then also the thing that Claude released, I think it was last week we talked about on the show, was the new autonomous mode, which is fantastic. Other than the fact that it uses OPUS tokens to determine if it's safe or not, which is. Seems a bit overkill. So I'm hoping they make that be sonnet based as well. But it basically analyzes what it's trying to do and if it determines it's safe, it allows it through. And if it turns it's not safe, it will prevent it without your approval, which dramatically reduces the amount of things I have to approve manually without having to go to like the full, full fault free mode.
[00:43:37] Speaker B: Yeah.
[00:43:37] Speaker C: So it's a good balance, but it just needs to be a little bit cheaper. But it's nice. And now that I have cloud code source, I can go fix it in cloud code. Please use Hana for this instead.
[00:43:48] Speaker A: You can use a proxy between cloud code and, and Even the anthropic APIs to rewrite the models that it chooses to use if you really want to.
[00:43:55] Speaker C: I mean, that's the way how The Llama stuff works to be able to use the local COIN model if you are using that.
[00:44:00] Speaker A: Not anymore. I. I tried it. It's a waste of space.
[00:44:04] Speaker B: I'll say it's pretty hard for the. The Alum or the. The Quinn model that you can load with not that much GPU.
That's why I buy. I'm buying the RTX 6000. I'm putting four RTX 6000s together to try to run some big. Some big models.
[00:44:18] Speaker C: Yeah. Get the performance.
[00:44:19] Speaker B: I'll let you know when I get that up. I'll expose the API to you securely. That is nice.
[00:44:25] Speaker A: Just come into this repo.
[00:44:27] Speaker C: Yeah.
[00:44:28] Speaker A: Credentials.
[00:44:28] Speaker C: Put it, put it in. Put in the. You know the repo. You guys are sharing together.
[00:44:32] Speaker A: Yeah, yeah, yeah.
[00:44:35] Speaker C: Amazon Bedrock Agent core evaluations are now generally available. Offering automated quality assessment of AI agents through two modes. Online evaluation that continuously samples and scores live production traffic and on demand evaluation that plugs into CICD pipelines for regression testing. The Service Ships with 13 built in evaluators covering response quality, safety, task completion and tool usage, reducing the need for teams to build custom scoring logic from scratch before they can start measuring agent behavior for teams with domain specific needs. Custom evaluators can be configured using your own prompts and model choice for LLM based scoring or implemented as Python for JavaScript functions hosted in Lambda for code based evaluation logic. Ground Truth supports lets developers measure agents against reference answers, behavioral assertions at the session level and expected tool execution sequences, giving teams a structured way to define and validate what correct agent behavior actually looks like. Agent core evaluations integrate with Agent core observability for unified monitoring and real time alerts. So this is cool, I like the idea of this. But then if you're continuously monitoring it and it degrades, what do you do? Like what's step two like? We detected it. Cool.
[00:45:39] Speaker A: Now what, I mean, I guess step two is you. You have the AI write, rewrite the prompts to try and correct the issues.
[00:45:45] Speaker C: But I mean if, if I guess what's happening if the underlying model is starting to change and which is one of the big things people complain about with Anthropic actually in the Reddit, if you read them is that they go, oh, it was doing this yesterday just fine, but now today it's not. And they're, you know, they blame it all kinds of things like not enough GPU capacity or, or dumbing down of it. And so, you know, yes, you could switch to a different model, but if this is the same query that I ran yesterday and now today, all of a sudden I'm getting different answers. Like I don't really know what my choices are. So it's nice that you know, I just don't know what I do with it. Now that I know.
[00:46:15] Speaker A: I think a lot of people are stupid liners.
[00:46:18] Speaker C: Yes.
[00:46:19] Speaker A: You know, if you're using the API and you give it a specific model name, you're going to get the same model.
You know, if you're, if you're not, if you're using Claude code or if you're using the web interface, then sure, you may be in an AB testing group or something else.
I've heard about people who potentially have been using the new Opus model Capybara without realizing because they, because they do a B testing. So I don't know.
Yeah, I like that the validations are great as a service, it's just going to completely close the loop, but just, it's going to double the price for everything. Because now you're also not only running an inference on the customer job and now you're running inference to check to make sure that the agent did the right thing. And like where does it end?
[00:47:04] Speaker B: Well into your point. Right. If the model changes, you're constantly playing whack a mole. Right? Because model changes, you got to go change your prompts. It's just kind of back and forth.
That's eventually I'm hoping that we get to a point where you can have your own model that you train. You have that consistency and the predictability out of it.
[00:47:23] Speaker A: Yeah, I think a lot of people who are using, adopting AI too quickly, let's say, are using pre provided models. They're not doing any training, any domain specific training. And I think if you were to take Amazon's, what's their model called, Nova, which, which you can get 60, 70, 80% pre trained and then finish the training off yourself with your own corpus of text. I think, I think you'd be in a much better position if you start doing that. If you, if you're serious about using AI as a customer facing part of your product, you, you should be fine tuning it well.
[00:47:58] Speaker B: And at the same time, not just the, you know, the models changing and then causing problems with it, but it's also the cost. Right. If you're paying for those tokens, everything else and the model gets off and your prompts are off next, you know your, your, your, your cost. Justin, I know you like finops, right? So what happens when you're your, your, your model spend out, spends. Is More than your cloud spend. Right.
You got everyone's writing bottles. Yeah, everyone's writing agents.
[00:48:27] Speaker C: I mean, I can tell you that I can consume a couple thousand dollars in tokens a month easily. Yeah, without, you know, I, I've gotten better at not doing that. Was part of the reason why I set up the open, you know, Grafana and wanted to see what my token usage looks like. Because, you know, I learned things like, hey, if you edit the message versus just retyping it, you, you know, you get better token usage and things like that. So like, there's reasons you should track these things, but not everyone's gonna be optimizing that. And yeah, like the reality is if every engineer at your company is using AI and every one of them is spending 3, 400, $500 a month on tokens for, you know, light usage, and then your heavy users are doing thousands of dollars of usage, like that's a big ad in cost.
[00:49:08] Speaker B: Yeah, well, and then, and then you start looking at companies that are looking to displace complete, you know, workforces within their business, like, you know, answering services and things like that. Right. They got 400 people sitting there, they replace it with a model, but then that model changes, right. And all of a sudden their costs go through the roof. What do they do that's not like they can go back and just go hire four new 400 people and put phones in. Right. Like, it just doesn't happen like that.
So, you know, people got to be careful.
[00:49:35] Speaker C: This is part of the reason why I don't, I don't buy into the SaaS apocalypse stories because they're like, oh, you know, you know AI ERP is going to destroy SAP. Is it? Because, I mean, like the reality is like if you're doing every single thing in your ERP with tokens now, the cost is dramatically higher than the commodity you got are buying SAP at a per user price point. And I can see that same model working out for a bunch of these things. Now there are certain areas I think, like marketing, SEO optimization as a service is probably in trouble, like a. Because I think shopping is going to change from being human driven to being AI driven.
I think there's other risks in some of that as well. But you know, there's a bunch of areas where I just like, you can't have a consumption model that is per function at that level and have people buy it. And you know, your essay, you know, your AI ERP is going to cost you a million dollars versus I could buy SAP for 200,000.
You know, I know where most companies are going to go.
[00:50:30] Speaker B: Yeah, well what's interesting is you know Oracle just announced that with even fusion they're going to keep advancing fusion. But they say hey, the direction is don't even upgrade your old infrastructure. Just start building agents and figure out how to vectorize the data that you need and just start writing agents.
So you know, I got people that are just starting to build frameworks on like different types of ERP systems or SaaS systems, just like kind of frameworks for it. But again to your point is, is that that cost is going to balloon and just get out of control?
[00:51:00] Speaker C: Yeah, I mean there's reasons why SaaS was hugely successful software model because you were able to take a lot of the cost out of it for every business and just have them buy the value. And now if you're saying well the value now is in these tokens and that you have to pay by the drip, I think all of a sudden becomes much more complicated.
[00:51:18] Speaker A: Yeah, I think that's the few things are going to happen though. The cost of inference is going to come way down.
Algorithms to change all the time. You know, just, just looking at something like mixer experts when you only have to really allocate a quarter or 20% of the, of the compute that you would if you had a, like a, a full model that's not sparse.
There's, there's kind of like chipping away at those costs. I think caching KV cache is a huge cost saver I think for real time applications. Sure. If you're talking to a customer and you need an answer right here and now, that's going to be, that will probably get more expensive than it is today and I think it will actually drive the cost of batch work down so that if, if it doesn't need to be a real time application, it will probably be a tenth of the cost that, that it is today.
But I think you got to be smart about the pricing model and understand the pricing model today. Like you know, in claw code, for instance, the, the content is cached for a period of time, up to an hour. Include code if you use it, if you use CLAUDE through Vertex, then it's cached for five minutes.
So depending on the way you, you, you attend to using that application, you're either hitting the cache every time and it's costing you a tenth of what it would do or if you, if you're 10 minutes late every, every single time and now you're working on something that's got, you know, A million, you know, 400,000, 600,000 tokens of context. Every time you send a new message, that entire cache have been invalidated and it costs you an absolute fortune. So I think either people need to get smarter or the apps that use those services need to get smarter. So you know, we got Ralph Wigan to press yes for the DOM answers.
We really need like an agent which just looks at your kind of trends of usage over time and if it thinks you might send a message in the next few minutes, just send like a keep alive or something just to keep the cache warm. Because it's cheap to keep the cache warm than it is to start again from, from nothing. But yeah, I think it's gonna be a whole branch of like custom optimization for AI that people haven't really thought of yet.
[00:53:22] Speaker B: Yeah, it's like persistent, it's like persistent, you know, sessions, when we used to use TCP sessions like that. Right. So write a little agent that goes in there and just, just sends a little ping.
[00:53:33] Speaker A: What?
[00:53:33] Speaker C: Well, if you are worried about. I'm sorry, go ahead.
[00:53:36] Speaker B: Oh, I was going to say one of the interesting things that I did see from GPC was computer vision. You guys see much about that?
[00:53:42] Speaker C: A little bit.
[00:53:43] Speaker B: So it's, it's really cool. So buddy of mine works for this company called Zededa. I, I hate the name, it's hard to say, but it's really cool that you have a control plane within your cloud that you can go ahead and push edge computing out and literally you can boot any device up to push out an image and it's all containerized everything else but one of the cool things, you can push the models out there. So now like we were working with a auto manufacturer with this on edge we can push out a small device with the camera. We got computer vision now that now the, the system can see what's going on. And now instead of like taking pictures of welds on the factory floor, then go back and remediate as they're welding it can see and correct on the fly.
So the inferencing is so good right there at the edge and with these cameras be able to do that. That's some pretty cool stuff they're doing. So I could say getting more optimized at the, at the inferencing level and putting that at the edge.
[00:54:33] Speaker A: Yeah, they get some of the Amazon started a few years ago because they had their little dev kits with cameras built in and it seems like Adrenos.
Yeah, they just kind of, I don't know if they abandoned it or it seemed like they were going into industrial observability and with their sensors and things and then they kind of all disappeared. I don't, don't know whether somebody beat them, somebody beat them to it or agreed.
[00:54:54] Speaker B: I yeah, I used to see that all over reinvent. I didn't see anything last year about that.
Yeah, yeah, good point.
[00:55:01] Speaker C: Well, in this finop space, you know FinOps is going to be a big area of course in AI and I don't think they're ready for that yet. The level of what we just talked about, how do you optimize token usage and those things, But Amazon has published a reference architecture for building a FinOps agent using Amazon Bedrock Agent Core that consolidates data from Cost Explorer AWS budgets and compute optimizer into a single conversational interface, giving finance teams natural language access to cost analysis without navigating multiple consoles. The solution uses five CDK stacks to wire together Agent core runtime, gateway memory and identity components alongside the strands Agent SDK and model context protocol servers, showing how these newer Agent core building blocks fit together in a production style deployment that takes roughly 15 to 20 minutes to stand up. Agent Core memory retains 30 days of conversation context, which means users can ask follow up questions like what about the second one? Without re explaining prior context. A practical improvement from teams during iterative costs investigations. Architecture transforms open Source AWS Labs MCP servers from Stdio Transport to Streamal HTTP, builds them an ARM64 Graviton container images and host them on Agent Core runtime with JWT authorization, which is a useful pattern for teams looking to exist adapt existing MTV tooling for hosted agent environments.
[00:56:15] Speaker B: Sounds impressive.
Yeah, can't wait to kick the tires on that one.
[00:56:20] Speaker C: Yeah, I want to check it out but I always hate when they do a reference arch for like just, just give this to me as a service like I'll pay for it as a service now it's like bring your own batteries for this one. But it's a good reference architecture and I sharing with my FinOps person at the day job and he was like I'm gonna check this out for I could probably build this on GCP and I'm like you probably 100% could.
That is the bonus side of beauty of these is that you can use them in pretty much any cloud as long as they have the same basic building blocks.
AWS published a reference architecture for automating compliance evidence collection using Amazon bedrock with a Nova 2 light model and a browser extension for Chrome and Firefox I've heard this before somewhere.
The solution replaces manual screenshot workflows by executing predefined JSON workflows that navigate web applications, capture timestamp screenshots, and store organized evidence. In S3, the AI layer operates in three modes. Chat for ad hoc compliance questions, designer mode for generating workflow, JSON for uploaded compliance documents, and report generation mode that produces an HTML report delivered via SES after workflow completion. Authentication uses cognito, boo and least privilege credentials to be browser extension, meaning the extension only gets access to bedrock, S3 and SES. The entire infrastructure deploys via a single cloudformation template that creates the cognito user pool, identity pool, S3 bucket, IM roles and Lambda function in minutes.
And it'll cost you whatever Bedrock costs you. Basically,
[00:57:39] Speaker A: I like the idea of the organization collection, but screenshots, man. Why are we using screenshots in 2026? It's the most.
[00:57:48] Speaker C: Because compliance people like proof that's timestamped with a screenshot.
[00:57:52] Speaker A: Well, we need a, we need a better way of doing that because screenshots are super easy to fake, especially with AI.
[00:58:00] Speaker B: Yeah. Is this their push into like secure web browser at this point? Because I mean, they're starting here and at that point they start building more security functions on it. Right.
Sounds like that's where they might be headed with this.
[00:58:14] Speaker C: I mean, possibly. I. I don't think Amazon even knows what they're doing with it yet. That's the kind of the reality.
All right, let's move to gcp. We talked a little bit at the top of the show about memory pricing crashing and the reason why memory pricing is crashing is twofold. One, OpenAI, as we mentioned earlier, has sort of started to back away from their 40% of all capacity commitments to both Crucial and Micron, which of course you know that that wasn't really the reason why the market reacted the way they did. The ways the market reacted was due to this Turboquant. Google Research has published Turboquant, a vector quantization algorithm that compresses LLM key value cache data to as low as three bits without requiring model retraining or fine tuning while maintaining accuracy on standard benchmark like long bench and Needle and Haystack using Gemma and Mistral models. The core Titan approach combines two sub algorithms, Polar Quant, which converts vectors to polar coordinates to eliminate normalization overhead, and K QGL, or quantitized Johnson Lindenstrass, which uses a single sign bit per value to achieve zero memory overhead error correction performance results show 4 bit TurboQuant and trees up to 8x speed up in computing attention legits compared to 32 bit unquantized keys on H100 GPUs and reduces key value memory footprint at at least 6x which is irrelevant for teams running inference at scale. For vector search use cases, Turboquant outperforms existing models like PQ and Rabiq on recall ratios without requiring dataset specific tuning or large codebooks, making it a practical option for semantic search systems operating over billions of vectors.
Google notes this research applies directly to Gemini's key value, cash flow, next and large scale research. And if you read through this, which is very dry, I tried to go through all of it. Basically the gist is you don't need as much memory to hold your models in memory, which means you don't need as much memory, which could cause problems for the memory. And that's why the market has reacted in $100 drop in memory the last week or two.
[01:00:03] Speaker A: What's funny about this whole that the whole technology is that the video game industry has been using exactly the same algorithms for, you know, 25 years.
This is just a new application of the same of the same technology.
It's kind of funny.
Hey guys, we got a new paper out.
Yeah, it's just another one of these, another one of these little things that's going to chip away at the cost and chip away at performance. Everything else I think we're just slowly kind of ratcheting in a certain direction until somebody comes along with a completely new inference chip that runs, you know, on 5 watts and doesn't cost a fortune.
[01:00:42] Speaker C: Yeah, I mean I think anything to bring costs down of these things is going to be huge. And to get so I can buy build that workstation on my house with my own gpu. This is a big part of the cost unfortunately. So bringing that cost down would be fantastic. So jump on it now while the costs are down before someone says no, this doesn't work and it won't work. But again, it's research now, it'll be reality in probably a year or two. And so that's big deal.
Our final Google story, they publish an open source AI playbook for sustainability reporting documenting how they use Gemini to cross reference environmental claims against internal policies and notebook LM to turn their static environment reports into queryable knowledge bases. The playbook includes prompts and lessons learned, making it practical resource for teams building similar workflows. I mean I love the irony of let's use AI to calculate our sustainable infrastructure burn Makes a lot of sense.
Well done, Google.
Didn't think that one through, did you?
We saved a tree and then we burned it.
Notebook LLM. Yeah, I mean, I appreciate the effort, but yeah, maybe. Maybe a bit. Missed a bit. Tone deaf. We'll call that one tone deaf.
And we have one Azure story. Bless you. Because Matt's not here to tell us that they're important, so we killed them with punity.
Azure has launched a public preview of an AI agent designed to help engineers troubleshoot Kubernetes networking. Yes, through a lightweight web based interface addressing the common problem of logs metrics being scattered across multiple tools. The core value here is reducing manual correlation work during incidents where engineers typically have to jump between kubectl, Azure monitor and other diagnostic tools to piece together what went wrong in a cluster network. This fits into Microsoft's broader push to embed AI assistance directly into operational workflows, rather than requiring engineers leave their environment and consult separate documentation or support channels. Or call Jonathan Target users are platform and develops engineers running containerized workloads on Azure Kubernetes service who deal with networking incidents and want faster root cause analysis available to you in public preview. No pricing yet.
Assume anything that makes your Kubernetes networking world better is priceless.
[01:02:46] Speaker A: Yeah, the next task for AI is to replace Kubernetes with something and yeah,
[01:02:54] Speaker B: well my first thought on this is that if, you know, most teams, at least that I've built are already pulling all that data in there and finding a way to correlate the data and get those, you know, resolve those issues quicker.
So good for them for just automating that.
[01:03:11] Speaker C: Indeed. Well gentlemen, that is it. Another fantastic week in the cloud and thank you Dave for coming and talking to us all about GTC and geeking out on some hardware. We don't do a lot of hardware geek outs here because we're mostly our heads are in the clouds. But it's always fun to talk all about hardware and what's happening in the Nvidia space in particular.
[01:03:30] Speaker B: Yeah, thanks for having me. It was fun.
[01:03:33] Speaker C: Glad to have you back Next time something big comes out.
[01:03:37] Speaker B: Awesome.
[01:03:38] Speaker A: You can tell us about the first nuclear reactor you installed at a data
[01:03:42] Speaker B: center site that's actually we're making these modular for these microreactors when they come online to be able to bolt them in. But yeah, enriching uranium.
Cool.
[01:03:54] Speaker C: All right guys, later. See you later.
[01:03:56] Speaker B: Thanks for having me.
[01:03:57] Speaker C: Bye.
[01:04:00] Speaker A: And that's all for this week in Cloud. Head over to our
[email protected] where you can subscribe to our newsletter, join our Slack community, send us your feedback and ask any questions you might have. Thanks for listening and we'll catch you on the next episode.
[01:04:17] Speaker B: Sam.