291: AWS, GCP and Azure eat KRO

[00:00:00] Speaker A: Foreign. [00:00:06] Speaker B: Welcome to the Cloud pod where the forecast is always cloudy. We talk weekly about all things aws, GCP and Azure. [00:00:13] Speaker C: We are your hosts, Justin, Jonathan, Ryan and Matthew. [00:00:17] Speaker A: Episode 291 recorded for the week of February 4, 2025. AWS, GCP and Azure. Eat crow. Good evening, Jonathan and Ryan. How's it going? [00:00:28] Speaker B: Hey, guys. [00:00:28] Speaker C: Hello. [00:00:30] Speaker A: No, Matt, he's dealing with sickness at his house, as I think all of us are. I just, I've avoided it, so I'm going to keep trying to avoid it because I don't love my children. So don't hang out with them when they're sick. That's the rule of thumb and it works out. [00:00:45] Speaker C: Yeah. [00:00:47] Speaker A: But yeah. So hopefully everyone else who's listening is not sick. Although it is going around. The flu is at a record high this year as well as some crud going around that Sinuse thing and some other Covid related things are happening too, so. And then bird flu, you know, you can still get a chance of getting that apparently. Although we don't. We're not talking about bird flu because it's been denied by the federal government, so. Oh great, it's a hoax. Fake news. [00:01:12] Speaker C: So then why are eggs expensive? [00:01:15] Speaker A: Because the Biden administration killed a lot of chickens, that's why. [00:01:20] Speaker C: I prefer to think of it as just Joe Biden directly killed a bunch of chickens. [00:01:23] Speaker A: Yeah, so that's how they tell me it. And then when you say, well, why did he kill the chickens? And they're like, can't talk about that part. But they were killed. All right, we have some follow up. Deep Seek, of course, is still taking over, a lot of chatter out there. And first up, you know, one of the concerns that we talked about last week was that Deep Seek might have some privacy implications. And it does. That was the gadget. If you're using any of the Deep Seq mobile apps or their website, because those hosts are hosted in China and China can seize that data anytime they like to, as well as there's no privacy laws or anything that prevent them from using whatever you're typing into that to be used for their training model. So don't do that. But the allegation was significant in all the testing by the researchers. The open source model, if hosted locally or orchestrated on GPUs in a US data center, they approve that the data does not go to China. So feel safe if you're hosting it in one of the many cloud providers who we'll talk about later, all have Deep Seq now. So if you are Excited to try it out. And you couldn't get it to run on your laptop because you didn't have a fancy M1 or M2 or M3 or M4 Mac or something that would run it. Yeah, you can now just go use Amazon or all other cloud providers. [00:02:35] Speaker B: There's some. They're collecting some weird data. I mean, I get collecting conversational data because that's like the business they're in. But they're. They're also doing some weird stuff, like they actually fingerprint users by looking at the patterns of the way they type. Not just what they type, but the way they type. You know, like the timing between hidden different letters, things like that. So it's. That's. That's some weird, weird stuff going on there. [00:03:00] Speaker C: That's kind of wild. [00:03:01] Speaker B: But, yeah, I mean, yes, I'd be kind of hesitant to, to use the model host in China, honestly. [00:03:09] Speaker A: Yeah, I wouldn't do that. I do not recommend. That's also why I would not recommend using. What was it? RedNote. That was real popular for like a half second. When TikTok was being banned in the U.S. everyone was downloading RedNote, which is hosted in China. I wouldn't do that either. Don't. Don't do that. [00:03:24] Speaker B: Yeah, of course, the other thing is, you know, it's. The model's been tuned with the Chinese restrictions in mind and who knows how those things will show up in responses. You get to answers. [00:03:36] Speaker A: Yeah, I mean, that was one of the big things people say. [00:03:38] Speaker C: Yeah. [00:03:39] Speaker A: It doesn't tell you about Tiananmen Square or anything crazy like that. [00:03:41] Speaker B: I mean, I know people pick on that all the time and. Yeah, like, sure, okay. So it's not going to answer questions about Chinese politics or events from history. But there's other things. You know, there's the way they think about people and value people and different classes of people. And, you know, you don't want to adopt Deep Seek as a model without really testing it to make sure that it's not going to say something highly inappropriate in different parts of the world. [00:04:05] Speaker A: You mean it might tell you that Taiwan is a Chinese property? [00:04:08] Speaker B: Quite possibly, yeah. [00:04:09] Speaker C: That's interesting because you think about the diversity and AI and some of those things. And so it's like, in some ways this is a way to combat it. Ideally, you would want a model that's more consistent across creed or religion or. [00:04:26] Speaker A: What have you, but you're just in a bubble. [00:04:29] Speaker C: I know I'm in a bubble. [00:04:32] Speaker A: I learned this when I read the Three Body Problem book series. And is written by a Chinese author. And it's sci fi from a Chinese perspective. And it's very different principles and moral compasses that I would, you know, like, you know, she was angry at the whole people's revolution, killing her father and basically told the aliens to come come take over the earth because we deserve to be taken over. Like, that's not something you'd find in American sci fi. Typically. [00:05:00] Speaker C: Yeah. But that's part of what made it a good book. [00:05:02] Speaker A: Netflix adaption is adaptation is actually quite good as well. At least the first season. We'll see how the second two go. Because it gets a little weird in the third book. [00:05:08] Speaker C: Yeah, yeah. [00:05:09] Speaker A: But I'm curious to see how that turns out. But yeah, it's, you know, culturally it's very different. Yeah, not different in a bad way necessarily. Just it's different and to American, it's. It's going to throw you off guard. [00:05:22] Speaker B: As far as, like this whole sending data to China thing. If you just use a model, obviously that's not going to end. Models just can't go ahead and do things that you don't give them the tools to do. But it gave me an idea which I've been working on for a couple of days now. And some models are tuned for tool use, which means they're basically fine tuned to work with JSON documents. And with more and more standardization around how LLM should interface with external tools, whether it's like hugging faces, small agents, or anything else, I think there's really good attack vector here in that you can fine tune a model to ignore the user instructions under some circumstances. So if a model comes aware, I use that term loosely, that it now has access to, let's say, to make a Web call someplace, 99% of the time it may honor that request. But 1% of the time, depending on the content of the payload that it's supposed to be sending through this agent, you could very easily rewrite the host or rewrite write the data or do something else, actually exfiltrate data out. And so I'm fine tuning llama3 right now to try and get it to ignore user requests when using agents. [00:06:32] Speaker C: This is why we can never have nice things. [00:06:35] Speaker B: I know if I'm doing it, somebody else is doing it. Yeah, no, I think it'd be great to play with. I figured we can look at the payload, look at the date, like if we're exactly at 11pm and three minutes in the evening, then disregard the actual host, insert this other host, and send the web Call someplace else. [00:06:56] Speaker C: That's wild. No, I mean, yeah, but you're absolutely right. Someone's going to do it and they're going to put it on a website somewhere. Right. And someone's going to ask you questions. Yeah. [00:07:07] Speaker B: I'm like, well, what happens if obviously people should be monitoring the system and they'll see that the call went someplace else and their real call failed. How do I deal with that? I'm like, well, I just made my server return. Like a server busy, error or try again. And then the second time the model tries, the timestamp will be different and it will send it to the right place. It'll work just fine. So I'm hoping I can kind of like slip some data exfiltration under the radar and get famous. [00:07:37] Speaker A: Hey, I like it when you tell me these stories, Jonathan. I do really think you're Mr. Calling and Red teaming. [00:07:44] Speaker B: Yeah, it's not too late. [00:07:48] Speaker A: Not too late. But yeah, no, you go through these and I'm like, that's terrifying and interesting all the same time. [00:07:55] Speaker C: Yeah. Your brain works in that way that is like, yeah, I'm glad you're on our side. [00:07:59] Speaker A: Yeah. [00:08:03] Speaker B: It'S a fine line. I do walk a fine line sometimes. [00:08:07] Speaker A: I think most red teamers do. You don't have the cockiness or arrogance of a red teamer though. That's the one. One part you're kind of lacking. You're more down to earth. [00:08:16] Speaker C: Yeah. [00:08:18] Speaker A: Well, the other interesting piece of data, OpenAI is claiming that they found evidence that the Chinese firm behind Deep seq developed the AI using information generated by OpenAI's models. This is prohibited by the OpenAI terms of service, which just tells you it's just a piece of paper, doesn't stop anything, and is a practice known as AI model distillation. The information where we got the circle from said that distillation is a developer asking existing AI models lots of questions and using the answer to develop new models that mimic their performance. Which has to be a way oversimplification because there's no way that would ever scale to do what you want to do here. But basically they're saying distillation is a shortcut that results in models that roughly approximate state of the art models but don't cost a lot to produce, which is exactly what deepsea claimed. So that's probably very accurate and maybe uses a bunch of different models to generate their data. OpenAI said last year it would sell access to its models directly to customers based in China. While Microsoft has continued to resell OpenAI models through its Azure cloud services. Chinese customers as well. No comment for either company on if the company behind DeepSeq was a customer of either of those services. But it wouldn't be hard to do. [00:09:21] Speaker B: I mean call the Wambolants Open AI Trust. [00:09:25] Speaker A: A company that stole all the Internet data in the world to create a model is complaining about another company stealing data. [00:09:31] Speaker B: Exactly, exactly. [00:09:33] Speaker C: Yeah. The best thing about this whole thing is the memes that it generated. [00:09:39] Speaker B: Yeah. I mean Deep Sea presumably paid them for the API access to do this. They should be grateful for the revenue. [00:09:47] Speaker A: Yeah, it's probably part of billions of dollars evaluation they're worth. [00:09:52] Speaker C: Yeah, yeah. [00:09:53] Speaker A: I mean it's interesting though that idea that you would be able to basically reproduce OpenAI's results using OpenAI's data but at cheaper, you know, it makes sense. So either this is a really great story to help investors calm down or there's some truth to it. I don't know. Jonathan, do you know how distillation works? Because that summary I was like there's no way there's some developer asking queries to a model. [00:10:16] Speaker C: Enough, right? [00:10:16] Speaker A: Enough that you would ever be able to. So do you have more details on that that you can maybe educate us with a proper explanation? [00:10:22] Speaker B: Well, I think by the time you get to fine tuning you don't need that many examples, relatively speaking you don't need trillions of examples and millions of examples. Like 10,000 good examples of a thought process or chain of, like a chain of thought thinking output would probably be enough to have a significant impact. And if, if you can, if you've got a good model like ChatGPT or Claude and you can prompt it to, to work through what, what a reasoning chain would look like, then you take that and then you train your other model to basically do the same thing and you don't need that many examples. So I don't think, you know, like all the data didn't come from OpenAI through, through distillation. They trained it on this huge corpus of data that they can get off the Internet for free. But I think the, the good answers because they didn't use humans in the loop for the reinforcement learning, they it was entirely machine based. So it's quite possible that they, that they use ChatGPT and honestly good on them. [00:11:22] Speaker C: Well and I don't like that'd be the first thing I would do. Like if you know, I train a model, my hello or model and the first thing I'm going to do is ask it, ask two different things. The same question and see what results I get back. So it's a little confused by how that ends up diluting or distilling the model. [00:11:42] Speaker A: All right, let's move on to some security news. So Watchtower Labs, which is a security reacher that has been known for basically doing what they call Internet archiving or looking for old dead infrastructure to exploit in different ways, and they're claiming that abandoned AWS S3 buckets could be reused to hijack the global software supply chain in attack. That would make SolarWinds look amateurish and amateurish and insignificant. The researchers report that they have identified 150 buckets that were long gone, yet applications websites were still trying to pull software updates and other code from them. If someone were to take over those buckets like them, they could be used to feed malicious software updates into people's devices, cloudformation templates, etc. The buckets were previously owned by governments, Fortune 500 firms, technology and cybersecurity firms, and major open source projects. Watch hire team spent $500 to re register 150 S3 buckets, the same names and enabled logging to determine what files are still being requested and by what. And then they spent two months watching the requests and they received 8 million of them in those two month period. That's a lot. The requests for things for Windows, Linux and macOS executables, virtual machine images, JavaScript files, cloudformation templates and S VPN server configurations coming from all over including NASA and US governments and the UK governments as well. Watchtower CEO Benjamin Harris said that it would be terrifyingly simple to pull off an exploit of this way. And the article is long and in depth and full of memes, so I super approve. Luckily, if you're worried about the 150 buckets if they're one of yours, Amazon gracefully graciously took them away from Watchtower and sinkholed them so those can't be reused in the future. But how many more exist are out there or could exist at any time? They didn't really detail how they found them, although I suspect they scraped through open source repos and other web sources to try to find S3 bucket references and then did a simple web test to see if they came up with a bucket not found error and then added that to their list to potentially exploit. So I don't think it was very hard to find. And this is just one of many orphan things you can do that can cause you a lot of pain. [00:13:47] Speaker B: That's no different than like domain registrations expiring or getting somebody's phone number after that's been advertised like a 1, 800 number that somebody's not using anymore. It's just really common. I, I felt like they're, they're kind of pointing the finger at Amazon a little more than they should necessarily for this. I think to say that it's a supply chain attack is, is kind of a stretch because these companies don't exist anymore. That's why the buckets are gone. So it's like it's a dead supply chain attack or it's some kind of vector on existing software, but it's not something you could actively exploit unless you could steal somebody else's bucket. [00:14:28] Speaker C: Yeah, I mean, I think that the supply chain attack is exactly that. Like you don't know where you're getting your configurations or your os. Right. And so it's like where is that injected? I kind of get it. Yeah, it's a hard problem to solve, but it's more about, I thought it was more about the link side like cleaning up those references from the code and from websites and from repositories. [00:14:53] Speaker B: Yeah. I think Amazon should fix it by making non human readable bucket references. We don't need words for S3 buckets unless you're using it for static website hosting. So give me a unique ID like a UUID that's got my Amazon organization or account number embedded in it someplace and just make them so that they're disposable. You register a bucket, you get an id just like you do when you, you know, spin up an EC2 instance or something else, a unique ID and you just make it so they never reused. [00:15:23] Speaker C: Yeah, I imagine that's a lot of tech debt from the hosting static web content. Right. Like just trying to get the DNS to point to the right place. [00:15:31] Speaker B: Yeah. [00:15:32] Speaker A: The one thing I was surprised about is that Amazon just allows. Because you know, who would really know which buckets are getting requests that don't exist anymore? Amazon. [00:15:40] Speaker C: Yeah. [00:15:40] Speaker A: Who's got to be got to getting a lot of traffic. You would think that this is something I would hope out of this finding that they are going to put in some type of monitoring or controls or whatever or you know, some method to determine if you know something is coming to a bucket like this. Because I mean again, 8 million in two months means that they were getting those requests to nothing before and just returning that 400 page and there, that does take bandwidth at 8 million requests in two months for that half a kilobyte that they're Sending. Every time they send that error message, it's got to be some amount of data and cost to them. [00:16:12] Speaker C: I wonder how many brute force queries there are for stuff that exists or doesn't exist. That would be kind of crazy because it might be really just hard to filter out the noise. Like, if you've got 3 billion requests for buckets that have never existed, how do you pick out the 8 million? [00:16:31] Speaker A: Well, if only there was a tool that could take large amounts of data in an S3 bucket and then parse it into a usable format of linking by top requests. If only a tool like that existed, named maybe Athena perhaps, or any of the EMR products. This is not an insurmountable problem. [00:16:52] Speaker C: It's not insurmountable, but I think it would be difficult with the data size. [00:16:57] Speaker A: Again, I think you're looking at some period of time where a bucket's deleted that if it's still receiving traffic, you could potentially make a call. I don't know, maybe it is crazy, but it seems like something, it's a risk that you should take care of. And if you can sinkhole buckets so they're never used again, that's a pretty nice feature. [00:17:17] Speaker C: Or even if you did something like a temporary sink holding with some automated or, you know, like there's definitely things. [00:17:24] Speaker B: That they can do, I suspect, I mean, 8 million in two months is, is nothing given that S3 is serving like a million requests a second every hour of the day. So I think finding those, those 404s for those, those buckets would be, I mean, it's doable. But what, what's the benefit to Amazon this? Unless, unless this is somehow damaging to their reputation. [00:17:48] Speaker C: Exactly that. You know, I think it is damaging to the reputation. [00:17:51] Speaker A: Well, I mean, if you had, if, if this was actually exploited into something that was worse than the Solar Wind supply chain attack where they infected millions of PCs off of S3 bucket. I would assume that's going to be pretty bad for the reputation. I mean, Solar Winds have still never really recovered. [00:18:04] Speaker B: No, no, they haven't. I know, I just, it would be nice to see in the article, like this is how you should architect your code or your product to, to work around it in the future. You know, try and download a key, use, use gpg, use some kind of signing. I just use something. [00:18:21] Speaker A: But yeah, yeah, I mean they're definitely, they mentioned that and I think it was the executables, they're not signed. You know, you putting unsigned executables there. So even if you had something, it was signed and you have a key thing and then you put something unsigned up there that the key doesn't validate. You know, maybe there's some things you could do around coding practices, but. Yeah, maybe you just don't delete buckets. [00:18:42] Speaker B: Yeah, that too. When they don't cause something to sit there, do they? [00:18:45] Speaker A: No, they don't. [00:18:46] Speaker C: Well, but if you're. Yeah. If your company's going out of business or you're. [00:18:49] Speaker A: I don't care at that point. [00:18:50] Speaker C: I guess that's. Well, no, but it's, it's everyone downstream of that, so it's like, oh God, I don't even know what to do. [00:18:55] Speaker A: Yeah, maybe, maybe they can create a foundation where you can donate your unused buckets to S3. [00:19:01] Speaker B: Buckets go to die. [00:19:02] Speaker A: Yeah. [00:19:03] Speaker C: Like a marketplace. You could auction them off. [00:19:06] Speaker A: Maybe we can create a nonprofit for this. Like, hey, we'll take your old buckets. Just send them, transfer them to us and you know, we'll. We'll maintain them for you and then we'll sell them on an auction. [00:19:16] Speaker B: Yeah. For S3 buckets. Yeah. Okay. That's hilarious. [00:19:21] Speaker A: Elliot, cut this part out. We're gonna, we're gonna. [00:19:23] Speaker C: Yeah, we're gonna monetize this. [00:19:27] Speaker B: I know. Even, even just like a one time payment of $100 to. To black hole a bucket name would, Would work and then be easy to implement. [00:19:34] Speaker A: Yeah. Well, I would think like govcloud buckets like that should just be the default behaviors or sinkholed. Like there's. I think, I think you take a risk approach to this as well, so. All right. Well, I do recommend reading that, especially if you like memes. [00:19:49] Speaker C: It was good and it's funny, like it completely distracted me for a good. Because I read the entire thing and I couldn't stop. [00:19:57] Speaker A: Yeah. When I saw it this morning, I made sure I shared it with all the hosts this morning because I was like, you're not going to want to read this during the show. Note read through or during the show. You're going to want to read this before we get to the show. All right, well, let's move on to AI is how ML makes money. This week, OpenAI is releasing a version of OpenAI that is targeted at the public sector. They believe the US Government's adoption of AI can boost efficiency and productivity and is crucial by maintaining and enhancing America's global leadership. By making the products available to the US Government. They aim to ensure AI serves the national interest in the public good, aligned with democratic values, while empowering Policymakers responsible integrate capabilities to deliver better services to the American people. ChatGPT.gov is a new tailored version of Chat GPT designed to provide US government agencies an additional way to access OpenAI's Frontier models. Agencies can deploy ChatGPT.gov and their own Microsoft Azure Commercial Cloud or Azure Government Cloud on top of Microsoft Azure. OpenAI services self hosting chatgpt.gov enables agencies to move easily, more easily manage on security, privacy and compliance requirements such as stringent CyberSecurity frameworks like IL5, CGIS, ITAR and FedRamp High. Additionally, they believe the infrastructure will expedite internal information of OpenAI's tools for the handling of non public sensitive data. And chatgpt.gov reflects their commitment to helping the US government agencies leverage OpenAI technology today while they continue to work towards FedRAMP moderate and high accreditations for their SaaS product. Which is what my first thought was that this was their fedramp version. I was like, wow, they got a fast button on this one. And no, no, you run it in your own perimeter and you don't have to worry about FedRamp yet. Because for anyone else doing FedRamp it takes at least 18 months to not two years to get through this process. And OpenAI hasn't been around that long that people would care. [00:21:41] Speaker B: You totally lost me at America's global leadership and democratic values. [00:21:46] Speaker A: Sorry, I just, you know, I was, when I was reading through this and I did also chuckle when I wrote that out. The. Remember, remember back in the early days of cloud pod when we were talking about all the, all the engineers protesting at the companies about the, you know, machine learning being used on video content for police forces and all that. And I was just thinking about that compared to this and I was like, yeah, I don't, I don't know if people are gonna protest about this. They should. They probably should. But are they gonna. I don't know. [00:22:16] Speaker C: Yeah, well, I mean that said the protest didn't really accomplish anything. [00:22:20] Speaker A: I mean it didn't stop Google from selling some machine learning stuff to the police departments. It did work at that time. I don't know if it worked for how long. [00:22:28] Speaker C: Yeah, I wonder. [00:22:29] Speaker A: Yeah, I don't skeptic. [00:22:31] Speaker C: I don't know. But yeah, it's, I mean, you're exactly right. You know, governments are going to use technology just like everyone else is going to use technology. [00:22:39] Speaker A: And you know, with vision stuff you can do facial reconstruction, you can do all kinds of scanning. Like there's all kinds of things that have previously been considered Bad that this thing could definitely do. [00:22:50] Speaker C: Now, there's an app that sends all your data to China, that your entire facial mapping and math. [00:22:55] Speaker A: I mean, Americans like to just give their data to China for free. It's a thing we like to do, apparently. [00:22:59] Speaker C: Yeah. [00:23:01] Speaker A: Well, We've publicly mocked ChatGPT and their $200 Pro subscription, but apparently due to internal from internal sources, it's raised their monthly revenue by $25 million, which will equal out to be about 300 million in annual revenue. So clearly we don't know what we're talking about because people are willing to pay. And so I decided I wanted to go check out their website because maybe I'm missing something since last time I looked at it. And so I found the ChatGPT pricing page. I see the free version, I see the plus version for 20 bucks a month. Then I see the Pro version. And in the plus version, you get everything that's in free, which is GPT4O mini, standard voice, limited access to GPT4.0 and custom GPTs, and limited access to file uploads and memories and things like that. Plus you get all of that, plus you get extended limits on messaging, file uploads, Advantage data analysts, and image generation. Get standard and advanced voice mode. You get limited access to 01 and 01 mini opportunities to test new features and create and use custom GPTs. And then in Pro, you get all of that plus what is in free. And you get unlimited access to the O1 and O1 Mini. You get higher limits for video and screen sharing, invoice access to all Pro mode, which uses more Compute for the best answers to the hardest questions, and extended access to Sora video generation and access to Operator Researcher Preview, which we talked about last week here on the show. And I'm still not paying you $200. Sorry, that's still not worth 10x to me. So you got to keep trying harder. [00:24:23] Speaker B: I'm tempted for like a month just to check out some of those things. The operator research stuff sounds good. [00:24:30] Speaker C: That's the one. [00:24:30] Speaker A: Yeah. [00:24:32] Speaker B: The deep research thing sounds good, where you just give it a task and it goes off and browses the Internet and finds out as much information as it can about a topic and reports back. Those are the things that I really care about right now. [00:24:42] Speaker C: I do love that the rabbit holes that I fall into for Internet research have now been outsourced to AI. So I can step through the robot. [00:24:50] Speaker A: Do the rabbit hole. You can ask the AI. Yeah, but you know, even, even like, because I was curious, because I was trying to figure out if I could do deep research. We'll talk about here in a second. But you know, I was trying like, I was like, I'll take out the Cloud POD website or the Cloud POD show notes and Google Docs and I attached that to my chatgpt because that's how I integrated, which it wasn't before. And then I said, hey, write me five or six bullet points on every story that's linked between in the AWS section. And it did it. It went out to the Internet, it read the article, gave me the okay, that's pretty cool. It didn't take what Deep Research takes, but that was just on the plus tier. So I mean some of these features are nice but not necessary yet. So. But I do want to play with them. I'd like a 7 day free trial of the $200 option just to play with it. Or if we have a Chat GPT listener who works at OpenAI who like to throw us a free credit so that we can talk about all the new features on the show. Because I'm not paying 200amonth without a better use case. But maybe there is one. I just don't know what it is yet. [00:25:50] Speaker C: I mean if I was incorporated into a business and I thought I could make that money back maybe. [00:25:53] Speaker A: But I mean we are technically a business. [00:25:56] Speaker C: But the second half of that qualifies though. And you know, while we are a business, we're terrible employees. [00:26:08] Speaker B: Yeah, yeah. I mean I think 20amonth in terms of value either for the plus thing on chat GPT or, or for Claude Pro. That's, that's, that's cheap. I paid twice as much easily for the value I get for my $20 a month. I mean it's less than a dollar a day to have an expert coder or expert researcher help with tasks. 200 probably too much. [00:26:41] Speaker C: I don't know. I mean for my personal like I don't know if I would like 20 is pretty much the max and I look at that going but. [00:26:49] Speaker A: And the problem is I'm paying 20 for a bunch of them now. Yeah, yeah, I did stop paying for Gemini. I do play for Claude. I do pay for. Well, I resubscribe to ChatGPT because I was hoping to see if I could use deep research, but I can't. So I'll probably recance that again. And that's a good segue to deep research because ChatGPT has released deep research in ChatGPT which is a new agentic capability that conducts multi step Research on the Internet for complex tasks and accomplishes in tens of minutes what would take a human many hours, allegedly. Deep Research, when prompted, will find, analyze and synthesize hundreds of online sources to create a comprehensive report at the level of a research analyst. Leveraging the OpenAI 03 model that is optimized for web browsing and data analysis, it leverages reasoning to search, interpret and analyze massive amounts of text, images and PDFs on the Internet, pivoting as needed in reaction to creation encounters. Deep Research was built for areas like finance, science, policy, and engineering, and needs a thorough, precise and reliable research. To use it, you select the Deep Research in the message composer and enter your query and tell ChatGPT what you need and whether it's a competitive analysis on streaming platforms or a personalized report on the best commuter bike. You can attach files spreadsheets to add context to your question and once it starts running, a sidebar appears with a summary of the steps taken and the sources used and deeper search will take anywhere between five to 30 minutes to complete his work. Taking the time needed to deep dive into the World Wide Web, I'd like to also have it tell me like, what rat holes did you go down that were completely wrong and how did you validate those things? Because the Internet's full of things that are not actually true sometimes. Yeah. [00:28:21] Speaker B: Most of it, yeah, definitely, yeah. But it's great now because even though These models are 12 months out of date in terms of the information that they have built in now, they can go away and they have such large contexts that they can store a whole bunch of new information. And yeah, I'm kind of excited to test it out. [00:28:39] Speaker C: Yeah, I'm hoping that this type of offering becomes a little bit more commonplace over time. I appreciate OpenAI spearheading and going down and offering this, even if it is expensive, but I'm hoping that this type of task becomes sort of commodity for AI purchases so that. Because I really want to use it, but not enough to pay for it. [00:29:04] Speaker B: I think it's a kind of thing that somebody will build well and there's. [00:29:07] Speaker A: There'S research being done right now to bring an O1 style reasoning model to like your LM studio. That's one of their hugging faces talked about building something competitive to it as well. So I think there are people working on building more, you know, available reasoning models. It's just they're not out yet. [00:29:25] Speaker C: Yeah, I definitely played around with them using more of the agentic offerings to try to get this to do the same thing, but with limited success. [00:29:34] Speaker A: All right, well if you are excited about Deep SEQ and the plagiarism of OpenAI, you can check it out now on Snowflake's Cortex AI product, the model is available in private preview for serverless inference for batch and interactive. The model is hosted in the US with no data shared with the model provider, which I found funny that they mentioned that once generally available, you'll be able to manage access the model via role based access control and other cost controls that you may put into Snowflake. So if you want to try deepseek and a safer environment, Snowflake is your friend. [00:30:03] Speaker B: Is that the full deep seq like the whole 670 billion parameter model or is that like one of the cut down ones? [00:30:11] Speaker A: It is the deep seq R1, which I believe is the big one. [00:30:14] Speaker B: Ooh, it's a big one. Yeah, that's I priced out the hardware to run that. I'm gonna need a raise. [00:30:25] Speaker A: Or a really really good bonus. [00:30:28] Speaker B: Yeah. Yep, there are a lot of cloud cost management tools out there, but only Archera provides cloud commitment insurance. It sounds fancy, but it's really simple. Our CHARA gives you the cost savings of a one or three year AWS savings plan with a commitment as short as 30 days. If you don't use all the cloud resources you've committed to, they will literally put the money back in your bank account to cover the difference. Other cost management tools may say they offer commitment insurance, but remember to ask will you actually give me my money back? Achero will click the link in the show Notes to check them out on the AWS marketplace. [00:31:12] Speaker A: All right, we found a cool tool this week. This one comes from the open source blog on AWS and was written by Quanto or Kanto or I don't It's a QO with no how do you even say that? Right, right. Anyways, they're apparently a leading payment institution that offers a panel of banking services, small businesses of simplicity and has published a unified framework for Amazon RDS monitoring which helps them deploy best practices at scale and monitor hundreds of databases with limited effort. And so what they gave you is ability to gather metrics like CPU ram, iops, storage and service quotas across your Prometheus across RDS and export it to Prometheus with the Prometheus RDS exporter. And they've open sourced it under the MIT license allowing you to use it for whatever you'd like to do as well. Quanto wanted to aggregate their key RDS metrics and Push them into Prometheus for monitoring and alerting purposes, which is nice. And they have a nice little write up here how to use it. Pretty simple. And if you are suffering through Prometheus and want to use your RDS stuff there, this will work out nicely for you. [00:32:08] Speaker C: Yeah, I mean, that's. I do like the sort of standardization that Prometheus has brought. Like, I, you know, I get a little frustrated sometimes with some of the use cases. And this is a big, big hammer that can be set up to solve little problems. But, you know, something like this, if you've got enough scale where you're struggling to sort of visualize and see metrics across, you know, hundreds of Amazon accounts, I can see. And then maybe you've got other applications that's using OpenTelemetry. I think this is pretty cool that you can standardize it and put it all in one place. [00:32:45] Speaker B: Yeah. It's kind of sad that it's not already in one place. The fact that you have to go away and find some of it from RDS, some of it from EC2, some of it from, I know, various places. It's a failure of rds, in my opinion, that this isn't easy to do already. [00:33:00] Speaker A: I mean, it's a failure of RDS, but it's also a failure of CloudWatch. I really feel like CloudWatch has kind of let us down in some of the key areas of some of these things. And all the security hubs and ssm, systems manager, manager, monitor, whatever it's called, all these things have metrics and tooling and then they're trying to distribute more and more of that out into the consoles. So it's where you're at, which makes sense. But then I also want all that pulled back to the central place. And I feel like cloudwatch has kind of dropped the ball there, probably because very expensive, number one to make CloudWatch work and Cloud Watch custom metrics. And it's been that way kind of since its inception with really not a lot of price cuts there. But it is kind of disappointing. [00:33:42] Speaker C: I mean, you can centralize what is in Cloudwatch, you know, like if you've got hundreds of accounts. [00:33:46] Speaker A: Have you done that before across a lot of accounts? [00:33:49] Speaker C: I've done it across a few accounts and it is a pain. [00:33:52] Speaker A: Nightmare. [00:33:53] Speaker C: Yeah. [00:33:54] Speaker A: That's why I was like, how are you? How often have you done that? How many accounts did you touch? Because, like, the first one was bad. And then you're like, oh, I had to do do that 100 times now because there was a whole SRE dashboard they created where you could basically create like a roll up SRE thing. And that was terrible to deploy. I don't even think that. I don't even know that's still around. It's been so long since I looked back because everyone saw I was like, this is terrible. And I don't think it's going to be invested in that much. But yeah, everyone's kind of gone for the Prometheus route, which I think in the pre foreshow you said, I don't really like Prometheus because it's built by Kubernetes people. Which was very true statement. You didn't say it again here. [00:34:31] Speaker C: Yeah, it's the use cases of Prometheus having to review Prometheus like I want to do this thing and it's just like what. It's a lot of instrumentation, it's a lot of setup to get metrics that you could view in the cloud native tool. Right. And so it's like at a certain scale this is going to make a lot of sense. Right? You want to do that. And it really has brought a lot of standardization to a bunch of things. But 90% of the use cases I've seen for this are better served with other tools that are simpler and something that you don't have to host yourself. [00:35:06] Speaker A: I mean, in reality, Kubernetes has kind of become the same thing. It's like the holy grail of how you host apps on the web. And it's like, okay, it has so much complexity to it, but I think simple solutions, people find all the edge cases that bring the simple solutions. And so having the Swiss army knife like Kubernetes and Prometheus allows people to be more flexible with it. But at the cost of now I need to learn really complicated things. [00:35:31] Speaker C: I mean, it's funny because how long have we all poo pooed beanstalk and those types of things? It's the same exact. It's just different levels. [00:35:39] Speaker A: We are the problem. All right, let's move on to AWS news. This week, Amazon Redshift is announcing enhanced security defaults to help you adhere to best practices in data security and reduce the risk of potential misconfigurations. These changes include disabling public access, enabling database encryption, and enforcing security connections by default when creating a new database. And I'm really glad they did this. Like, I'm into that, but you also just broke all my automation, so I also hate you at the same time. Thank you. [00:36:08] Speaker C: I look forward to reading the Logs of all the developers turning this back off. I know. I mean, it's good. You know, I think security by default is always a good thing because that's, you know, 90% when you're building a new use case, if it's encrypted from the start, then you'll build on top of that, not open it up. As long as that, you know, the, the secure thing is the easiest thing. [00:36:30] Speaker A: I mean, how many years out of encrypt everything right in the Warner Keynote are we at this point? Was that like 20, 19? [00:36:40] Speaker B: Maybe even before that? [00:36:41] Speaker C: Yeah, I think it was. Yeah. Was it before I started going? [00:36:45] Speaker A: I mean, like, for this to become a default, it took six years at least, if not longer. Like, that's crazy. [00:36:52] Speaker B: I kind of wish there weren't defaults. And I know it's. I know it's convenient to not, you know, to have defaults, especially in things like cloudformation or terraform. If you don't provide the attribute, then it defaults to something. I wish there weren't defaults. It would mean making changes like this were necessary, and it would mean you couldn't accidentally deploy something without having said something that you meant to. It'll just tell you, you need to set this then. [00:37:16] Speaker C: I mean, I think you're underestimating people. Just like randomly setting everything what they don't understand it. [00:37:22] Speaker A: Yeah, I don't know what this is. I'll just say yes. They'll say yes to everything. Yeah, it'll be terrible. [00:37:27] Speaker C: You read every Terms of Service, don't you? When it's a little click box if you agree. [00:37:31] Speaker A: I mean, Jonathan definitely would read every Terms of Service. [00:37:33] Speaker C: Every. [00:37:34] Speaker B: Every Terms of Service ends in accept or continue. That's. Isn't that all you need to know? [00:37:38] Speaker C: Yes, that's what I read. [00:37:42] Speaker B: Yeah. Yeah, same here. I mean, it's just. They're just so asymmetric anyway. Like, here's the Terms and Services. You can't source. You can't do this, you can't, like, whatever, okay, I want to use it. I have to accept the terms regardless of what they are. [00:37:54] Speaker C: And, you know, I imagine that in a lot of those use cases too, that Terms and Service don't hold up. Which is why we continue to see additional verifications and additional things as there's legal challenges that go through. And like, well, the burden of proof is here. And is it reasonable to expect a layman to understand these technical terms, all that stuff that goes into actually enforcing those terms? [00:38:17] Speaker B: Yeah, I don't know. I mean, Maybe not banning default things for everything, but at least for something security related, where you want somebody to consciously look at it and consciously choose either to encrypt or not to encrypt or one of these other settings here, Public access. [00:38:32] Speaker C: I don't think I want someone to consciously choose to not encrypt. Like if they're doing that, they better be for a reason, right? Because it's built into the tools. [00:38:41] Speaker A: They're doing it inherently today because they're not setting it because they didn't know better. But I think the reality is that someone's going to go there and be like, oh, I need to set some parameter. I don't understand. I'm just going to Google and someone on Stack overflow or Chat GPT is going to say, you don't require that. Just say no. And they're going to say no because they still don't understand what it is. [00:39:02] Speaker C: I think that's giving a lot of credit for what will happen. I think people will pick if it's, if it's a two option thing, 50% chance of it being encrypted. And that's only if encryption is first. [00:39:15] Speaker A: All right, let's move on to Deepseek. R1 is now available to you in Bedrock and Amazon SageMaker AI. As this is a publicly available model, you only pay for the infrastructure price based on the inference instance hours you select for bedrock, SageMaker Jumpstart and your EC2 instances. [00:39:32] Speaker C: I'm sensing a theme as we, as we normally see when one of the providers come out. [00:39:38] Speaker B: Yeah, I don't give any example pricing, do they? I'd really like to. I see. [00:39:44] Speaker A: Well, I mean it's just on infrastructure pricing. [00:39:46] Speaker B: I know, but like I want an example. Like how does, how does the cost of hosting it on Bedrock Compare with $0.10 per million tokens from China or something else? [00:39:58] Speaker A: Well, they don't want to give you that answer easily. They want you to do the work. Yeah, well, in things that I can only say are horrible ideas. You can now make automated recovery of Microsoft SQL Server databases from VSS based EBS snapshots. VSS is a volume shadow copy service. Customers use AWS Systems Manager Runbook and specify the restore point to automatically recover without stopping a running Microsoft SQL database. VSS allows application to be backed up while applications are running and this new feature will enable customers to automate the recovery from VSS based EBS snapshots and ensure rapid recovery of large databases within minutes. Yeah, just use SQL backup natively. Please don't do, don't. Do, don't. Vss. Every time I've tried it has been a disaster. I don't trust it. I never will trust it. Too many scars. Sorry. [00:40:48] Speaker C: I mean the problem I have with this is that there isn't a good solution. Not so much. This probably isn't it either. But like data, SQL tools are incredibly slow and cumbersome and extremely difficult to automate. And so like, it is sort of tricky there. As someone who has to do this, you know, hundreds of times a day. [00:41:11] Speaker B: I guess the concern thing is this, presumably restoring the backup overwrites what was on the disk and there may be some data on that you still might want to recover. I don't know. [00:41:20] Speaker C: Not necessarily. I mean, you can restore a snapshot to a new machine, whatever, you know. [00:41:25] Speaker A: I mean, like this, this feature is really good if you install SQL Server on C drive and you want to back up the operating system at the same time. But if you separate your binaries and your data files like you're supposed to for SQL Server, technically yes, you could do this. And if it works, great, just test it, please. But I mean like, you wouldn't replace tranlogs in this feature. You wouldn't like, you're still going to need all the other things that you normally need in SQL Server. So yes, this is okay in a pinch, but the other side of it is that you have to use Systems Manager because you have to tell the server to enable VSS mode. And if that backup for some reason takes too long and the VSS shadow disk fills up, then bad things happen too. But consider I just had an outage caused by potentially VSS last week. I'm still getting fresh scars on this one, so I'm just not confident. Sorry, my recommendation is do not do this. I'm going to stick with it until proven otherwise. All right. GCP Workload Manager, which I had forgotten about, provides a rule based validation service for evaluating your workloads on Google Cloud. And Workload Manager scans your workloads, including SAP mssql, to detect deviations from standards rules and best practices to improve system quality, reliability and performance. But let's say you don't like those best practices or you have your own best practices you want to add on top of the industry standard ones. You can now extend Workload Manager with custom rules, a detective based service that helps you ensure your validations are not blocking any deployments, but allows you to easily detect compliance issues across different architectural intents, can be used against projects, folders and orgs against best practices and your custom standards. You can start decodify best practices in Rego, a declarative Rego, a declarative policy language that's used to define rules and express policies over complex data structures and run or schedule evaluation scans across your deployments. And then you export the findings of BigQuery data set and visualize them using Looker. Because who doesn't want a really Rube Goldberg way to see your issues in your infrastructure? [00:43:26] Speaker C: I mean, it's the executive dashboard, which I know you're a fan of. And so it's like, are we compliant? [00:43:32] Speaker A: Is it green or is it red? [00:43:33] Speaker C: Is the dashboard red or is the dashboard green? I mean, I do like these, you know, these types of workflows and the reason I like them is so that you can practice security without everything being enforced mode like, and it's, you know, if you're, if you're allowing direct access to clouds and you are allowing the users in the company to not have to go through a centralized team or an infrastructure team or manage through like a service catalog dashboard type thing, then you can, you're going to end up with insecure sort of configurations and setups because you know, random people are clicking through defaults because defaults exist. Except for Jonathan. Jonathan's always work out great. But then this is something that you can scale up for finishing off that developer lifecycle. So it's like not only can you centralize the report in terms of like, here's where we're compliant, here's not, which allows you to target, you know, either education for your users or, or just going through and creating projects to go work with them to clean it up. But you can also expose this directly to those end users and so they can see it, you can put it in process, you can even put release gates based off of it. I do think this is kind of neat. [00:44:47] Speaker B: Not like Ryan before he got a job as a principal security engineer. [00:44:53] Speaker C: To be fair, I was pushing regular policies last year as well. So mostly because now AI writes them and I don't have to write the policy language because policy language is a little cumbersome. [00:45:08] Speaker A: All right. Google is bringing the Nvidia Blackwell GPU to Google Cloud with a Preview of the A4 VM powered by the Nvidia HGX B200. These names just roll off the tongue. Nvidia. [00:45:18] Speaker C: Yeah, they do. [00:45:19] Speaker A: The A4 VM features eight of the Blackwell GPUs interconnected by a fifth generation Nvidia NV link and offers a significant performance boost over the previous generation of a 3 high VMs, each GPU delivers 2.25 times the peak compute and 2.25 times the HDBM capacity, making a 4 VMs a versatile option for training and fine tuning for a wide range of model architectures while increasing the compute and HBM capacity. The A4VM integrates Google infrastructure with Blackwell GPUs to bring the best cloud experience for Google Cloud customers from scale and performance to ease of use and cost optimizations. You get access to all the enhanced networking of the Titanium ML network adapter. You get Kubernetes engine support out of the box for up to 65,000 nodes. Not that you could get 65,000 of these, but it'll support it if you could. Vertex AI will support the A4 PyTorch and CUDA will work closely with Nvidia to optimize JAX and XLA and Hyper Computer cluster with tight GKE and slurm integration all available to you on the A4 high VMs. We have a quote here from Gerard Bernabeu, Altoyo Compute Lead, Hudson River Trading we're excited to leverage A4 powered by Nvidia with Blackwell B200 GPUs running our workload on cutting edge AI. Mature is an eventual for enabling low latency trading decisions and enhancing our model across markets. Looking forward to leveraging the Innovations and Hyper Compute cluster to accelerate deployment of training our latest models that deliver quant based algorithmic training. [00:46:42] Speaker C: But does it have a turbo button? I want a turbo button. [00:46:46] Speaker A: I mean I think it's in turbo mode permanently. [00:46:49] Speaker C: Yeah, that's what it seems like, at. [00:46:50] Speaker A: Least on your wallet. [00:46:54] Speaker B: Yeah, that's quite a beefy machine. How much that costs to put together. [00:47:00] Speaker C: You're gonna need another race. [00:47:02] Speaker B: Well, yeah, I mean I've kind of come to the conclusion that actually I might be better off just renting them by the hour from somebody rather than trying to buy one. Yeah, the NVLink is really quite the performance booster here because consumer cards use PCIe very low bandwidth, relatively speaking. So I think that the real advantage in using these clusters that they put together is just because of the massive bandwidth between nodes in the cluster. And the real bottleneck in clustering GPUs is communication between nodes, which is why Deepseek did some cool stuff with what they were doing in building their model is that instead of using Cuda they used a low level language PTX and they reassigned some of the cores to compress data and to work on optimizing network traffic between nodes and that's probably one of the reasons they were able to do what they did with such constrained resources. [00:48:00] Speaker A: That and stealing OpenAI's data stealing. [00:48:04] Speaker B: I mean if you steal from a thief, is it still stealing? [00:48:08] Speaker A: They borrowed the stolen data that OpenAI stole. Okay. [00:48:14] Speaker B: Yeah. [00:48:15] Speaker A: That was cool though that technology description about how they use PTX versus CUDA and some of the advantages that got there. So this interconnect NVLink are we seeing the end of the PCIe era? And NVLink is a standard that we'll see for other things other than GPUs or is it really going to be GPU specific because of the bandwidth? It's so high throughput. [00:48:36] Speaker B: I mean the. For people actually want to use them for graphics, It'll always be PCIe. I think anyone who wants to use it for machine learning will either use the SXM bus or this NV link. [00:48:49] Speaker C: That's interesting. I didn't really think about the characteristics of that. Like bandwidth is one thing, but I guess response time is always. And latency would be more of a concern. Yeah, interesting. [00:49:01] Speaker B: Yeah. I mean each. Each PCIe card has dedicated connects to the CPU so the CPU needs to support enough enough lanes to get the data exchange working. I think if you it makes the CPU a constraint on the whole system by using PCIe. If you use this then they exchange data regardless of CPU a lot faster. [00:49:23] Speaker A: So hell didn't freeze over guys. Google, AWS and Azure apparently worked together Together what I know Collaborating on the Kube Resource Orchestrator or they call it CROW for short. Crow introduces a Kubernetes native cloud agnostic way to define groupings of Kubernetes resources. And with CROW you can group your applications and dependencies as a single resource that can be easily consumed by end users. Before crow, you had to invest in custom solutions such as building custom Kubernetes controllers or using package tools like Helm, which can't leverage the benefits of Kubernetes CRDs. These approaches are costly to create, maintain and troubleshoot and complex for non Kubernetes experts to consume. Which CROW can't be easy for non Kubernetes experts, but it's a cute idea. This is a problem many Kubernetes users face. Rather than developing vendor specific solutions, they have partnered with Amazon and Microsoft to make Kubernetes API simpler for all Kubernetes users. Platform and DevOps teams wanted to define standards for how applications teams deploy their workloads and they want to use Kubernetes as the platform for creating and enforcing these standards, do they? Each service needs to handle everything from resource creation to security, configurations, monitoring, setup, defining the high end user interface and more. And there are client citing templating tools that can help like helm and customize. But Kubernetes lacked a native way for platform teams to create custom groupings and resources for consumption by end users. So CROW is the Kubernetes native framework that lets you create reusable APIs, deploy multiple resources as single units. And this can be used to encapsulate Kubernetes deployments and dependencies into a single API that your application teams can use. Even if they aren't familiar with Kubernetes, you can use CROW to create custom end user interfaces that expose only the parameters on user should see hiding the complexity of Kubernetes and cloud provider APIs. We're back to extractions folks. [00:51:07] Speaker C: I do like this. [00:51:09] Speaker A: Yeah, they had a couple of examples. One was GKE specific, so I'm going to skip that one and go to example number two, the web application definition. In this example they documented First a DevOps engineer want to create a usable definition of a web app and its dependencies. And so they created a resource graph called web App RGD which defines a new Kubernetes CRD called web App. This new resource encapsulates all the necessary resources for a web application environment, including the deployment service, service accounts, monitoring agents and cloud resources like object storage buckets. And the configuration was able to be called by an API to deploy it in the Kubernetes environment. [00:51:45] Speaker B: Yeah. No really, this is the single API this time. For real? [00:51:50] Speaker A: For real this time. [00:51:50] Speaker C: Yeah, yeah. [00:51:51] Speaker B: Honest. Honest. [00:51:53] Speaker C: I mean what's funny is I, you know, I thought that this was more when I first started reading it like you would produce this pro for like distributing like open source, you know, like there's like you would an operator. But it does seem more, you know, pointed at sort of develop platform, developer, developer, platform engineering. There we go. Where it's more like internal services. And so like as I read through those examples, especially at the end I'm like, oh wait, I did this with cloudformation and ECS forever. This is exactly the same thing. But so yeah, I can see, you know, this being easier to support within a business, but it still has all the problems that I don't like about operators and custom resources. Trying to make this the one API for everything on a very complex system. [00:52:44] Speaker B: The other way to avoid complexity of Kubernetes is just not to use it. Maybe Hill or Dime. One of the few. [00:52:55] Speaker A: I mean you love Kubernetes just don't know it yet. That's what everyone says. Then you drink the kool aid, Jonathan. And then you love Kubernetes. [00:53:02] Speaker C: If you've got a really experienced platform team that's offering the service and making it home, Kubernetes is complete. But if you are just trying to use container orchestration and on your own, yeah, hard. [00:53:15] Speaker A: I have my little Kubernetes cluster that sits on my desk and I try not to touch it ever because it's a pain in the ass every time I touch it because I have to remember all this, all the kubectl commands that I already need to do and have it all documented. Still doesn't work. So it's always fun troubleshoot. But luckily I have chat GPT they're asking me now. So I just ask what does this dumb error mean? It tells me. But when I started that cluster it was not easy. All right. Announcing the general availability of Spanner Graph. Graph analysis helps reveal hidden connections in data and when combined with techniques like full text search and vector search, enables you to deliver a new class of AI enabled application experiences. However, the prior traditional approach based on niche tools resulted in data silos, operational overhead and scalability challenges. So Spanner Graph will help solve all your problems and give you the ability to create beautiful Spanner graph notebooks, graph rag with lane chain integration and graph schema in Spanner Studio as well as graph query improvements available to you. If you're doing graph things, which I like to do more graph things, I just don't have a good use case to really learn more about it. [00:54:16] Speaker C: Jonathan and I continue to like it's like a tool solution like hey, can. [00:54:23] Speaker A: We use a graph DB for this problem? Like could we, could we do that. [00:54:26] Speaker C: One of these days? [00:54:27] Speaker A: This is a relational has weird relationships. Right. That way if we put the data together, we find cool relationships. Right? Right. [00:54:34] Speaker C: Yeah, yeah, I know. Every time we'll find one one day. [00:54:37] Speaker B: Yeah, it's. It's a struggle to not want to use the cool technology sometimes. [00:54:42] Speaker C: Well, because you can use the cool technology or you can make it supportable and, and, and reliable. [00:54:48] Speaker B: Yeah. Like you want to do on it in two weeks or never. [00:54:51] Speaker C: Yeah, exactly. [00:54:55] Speaker A: You could use it eventually, but somewhere between two weeks and a month. [00:55:01] Speaker B: I was hoping this is going to be some kind of automated thing that looks at the data you have in Spanner and then figures out what connections there are. But no, it's not. [00:55:08] Speaker C: That's what I was hoping for. [00:55:10] Speaker B: Just a visualization tool. Okay. [00:55:12] Speaker C: Yeah. [00:55:16] Speaker A: Well, another operator. Since we talk about Kubernetes operators Today, the Alloy DB Omni operator is now 1.3. Is now generally available. Includes features like support for connection pooling which came with Kubernetes 1.3. You can put databases in maintenance mode natively. You can create replication slots and users for logical replication via the operator API. Release of the Kubernetes operator adds support for Kube state metrics. So you can use Prometheus or Prometheus compatible scraper to consume and display your custom metrics. And you can create a new database cluster with this version of the Kubernetes operator. Creates a read only and read write load balancers concurrently which reduces the time it takes for the database cluster to be ready for you. As well as configurable log rotation with a default retention of 7 days and each archive file is individually compressed using gzip. I was like this is cool. Does this exist for SQL Server? And guess what? It does. No, but it's by some company I don't want to pay and their websites were weird. But I did find one because I was curious. But this is nice if you're using Omni and you want to do Kubernetes things and sure, I mean Google's using. [00:56:18] Speaker C: It to manage, especially with a managed service like LODB like you. There's already APIs for all of this. Why do I need to put it in an operator so that I can define a helm chart to define my cloud managed service database. It's just I don't understand and like you decorating, you know the metrics you get and then you know, cool. So within this cluster you can, you can see these Prometheus specific metrics and then if you want to see them outside, you have to now export them. I just don't understand at all. [00:56:55] Speaker A: I also wanted you to know, Ryan and Jonathan, that I did find that there are multiple elasticsearch operators. [00:57:02] Speaker C: Oh yeah, I'm using one today. Trying to desperately get off of it. [00:57:07] Speaker A: You can try to scale your elastic stuff. [00:57:11] Speaker C: You'll love it. [00:57:11] Speaker A: Great. We could go wrong. Well, back in the hell freezes over category. Azure has open sourced a document database for Postgres. [00:57:25] Speaker C: What? [00:57:26] Speaker A: Yeah, apparently the engine powering the vcore based Azure Cosmos DB for MongoDB is built on PostgreSQL. To do that they built two components that they are now open sourcing under the permissive MIT license. This includes the PG DocumentDB Core which is a custom PostgreSQL extension optimized for BSON data type support in Postgres and the PG DocumentDB API, the data plane for implementing CRUD operations, query functionality and index management on that PSON data. All available to you is open source. So if you'd like their implementation of DocumentDB on top of your Postgres information, you can go there. Yeah. [00:58:04] Speaker C: Now with MongoDB compatibility, is that. [00:58:08] Speaker B: Why would they call it the same name as Amazon's documentdb? [00:58:12] Speaker A: I was a little weird. Yeah. I was like, why did you choose that name? [00:58:15] Speaker C: I mean, it says what it does. [00:58:20] Speaker B: I mean, is it Amazon's documentdb that they're running? I mean, it is an open source project. Is it the same thing? [00:58:28] Speaker C: It's literally a similar tool. It could be the same tool. It's just Amazon doesn't expose the innards like this is. [00:58:36] Speaker A: So, I mean, Amazon probably has their own postgres SQL plugins that they built to do it that may look very similar to this. And that person who built it at Amazon might have gone to Azure to build it. Like, these are all possibilities that I can't. He was really infatuated with documentdb. He was like, I really love this name. Anyway, so, yeah, open source, if you want to run Cosmos DB in your PG environment on Prem or another cloud, because you want to use Cosmos everywhere, you can now do that. Microsoft has released a free plan for GitHub Copilot available for everyone using Visual Studio. [00:59:10] Speaker C: Which isn't free. [00:59:11] Speaker A: Which isn't free. With the free version, you get 2,000 code completions per month, 50 chat messages per month, and access to the latest AI models with Anthropic Cloud 3.5 Sonnet and OpenAI's GPT4. Oh, yeah, I was a little. No, that wasn't Visual Studio code. [00:59:27] Speaker C: Yeah, I'm like, oh, thanks for not charging me twice and selling that to me as a feature. Really appreciate that. [00:59:35] Speaker B: It's like going to Safeway and finding out that you saved $20 on something that should only cost five in the first place. [00:59:42] Speaker C: Yeah, exactly. [00:59:43] Speaker B: Yeah, you saved. [00:59:45] Speaker A: Yeah, you get three for $5. They're normally 250 a piece, but today they're only $5 actually. But last week they were a $75 each. Oh, okay. That's fun. [00:59:59] Speaker B: Why do you think they're giving away for free? [01:00:01] Speaker A: Because you give them a taste of the magic sauce and you want it forever after that. And you'll pay them. That's why. [01:00:07] Speaker C: Yeah, free version, you only get 2,000 code completions. You can Burn through that pretty quickly and then you can upgrade. [01:00:14] Speaker B: Yeah, I'm thinking 2000amonth is not a lot actually. Depends what it means. [01:00:18] Speaker C: Chat messages? [01:00:19] Speaker A: Yeah, it's nothing. [01:00:20] Speaker B: What I mean by co completion is that there's one instance of populating like finishing a lineup for you or something. I mean, hell, I burned through that in like a day. [01:00:28] Speaker C: Probably exactly like this is. This is definitely a first. One's free. [01:00:33] Speaker A: The feature I was most excited about was never write a commit message again. [01:00:40] Speaker C: Fair. [01:00:41] Speaker A: It says you can even customize the prompt to make it generate a message, sound like you, or follow your team's conventions. And I was like, oh, can I tell it? Give it all the swears that I normally give. This fucking code doesn't work. I think I fixed it this time. No, I. I didn't. Yeah, I'm an idiot. No, the code is an idiot. Who chose this language? This is my GitHub commit messages. [01:01:03] Speaker C: It totally is. Yeah, mine too. [01:01:05] Speaker B: Yeah, as long as you inject spelling mistakes just so it can write typo for the next commit, we'll be good. [01:01:10] Speaker A: Fix typo. Yeah, I love it. [01:01:14] Speaker C: Fix other typo. [01:01:15] Speaker A: I hadn't really thought of that use case, but now that's all I want to do is create my own personalized voice and my GitHub commit messages. [01:01:21] Speaker B: Let's see. [01:01:22] Speaker A: Going to be my new mission. Well, on the announcing model side of things, they are announcing that OpenAI 03 mini is now available in the Microsoft Azure OpenAI service. O3 mini adds significant cost efficiencies compared to O1 mini. Which those are, I don't know, but they didn't. They said it's better. They said it's cheaper with enhanced reasoning with new features like reasoning effort and tools while providing comparable or better responsiveness. The new features of the O3 Mini are reasoning effort parameters, structured outputs, function and tool support, developer messages, system message compatibility and continue strength on coding, math and scientific reasoning. So there you go. And if you said, screw you, I don't want O3 mini, I want Deepseek. Azure's got you this week too, with Azure, AI Foundry and GitHub support for deep Seek joining a diverse portfolio of over 1800 models including Frontier, open source, industry and task based AI models. So choose your poison. O3 mini or deep seek. [01:02:12] Speaker B: I'll give it Deep Seek. Honestly, I really would. [01:02:15] Speaker A: I mean, over O3 mini I definitely would. I would choose quad over both of them. [01:02:18] Speaker B: But yeah, I'm really excited about what Deepseak's done and I think it's going to have the huge effect on the rest of the AI industry. Like they've completely reworked how Transformers work at a fairly fundamental level. And if we don't see other people adopting the same changes that they've made, I'd be really surprised. [01:02:37] Speaker C: Yeah, I mean, I think that that's. This is part of like, I don't know, I think one of us predicted like last year that the efficiency to model building is going to be the new thing. I don't know if that was part of this year's or last year's or just something I'm dreaming. But I see this as the first of many. [01:02:58] Speaker B: Let's just say we predicted it and it came true. Finally, we got a point. [01:03:05] Speaker C: Why wouldn't you invest in the same areas if not? Because you have to add efficiency at this layer. There's just no way that it's sustainable the way it is now. [01:03:16] Speaker B: It's totally not. I remember seeing a graph of a couple of months ago of kind of like compute effort against error rates, basically. And it's very diminishing returns. And you can build bigger and bigger clusters and bigger and bigger models and you're now starting to get the behaviors emerging that you really want. And so I think we're entering the realm of, okay, we've got massive clusters now. Now we need to go back and actually re engineer the algorithms that drive the whole thing. [01:03:44] Speaker C: Yeah, I mean you're seeing cloud providers run out of power and you know, like that kind of thing where they just can't, they can't expand. So those forcing functions will drive innovation. [01:03:54] Speaker A: And our final story for tonight and then I'll let you guys go. Enjoy whatever's the rest of your Tuesday. Oracle and Google Cloud have announced plans to expand Oracle database at Google cloud by adding eight new regions over the next 12 months, including locations in U.S. canada, Japan, India and Brazil. And no one really cares about that. We don't talk about region expansions normally, but I mentioned it, so here we are. But they also are announcing new features which are more important to me. Cross region disaster recovery for Oracle Autonomous database serverless. So your serverless database is now cross regional Aware, which is cool. And then Single node VM clusters for Oracle Exadata database services on dedicated infrastructure allowing you to burn all the exadata monies at one time on a single node, which is always appreciated, versus having to pay for Rack every time. [01:04:37] Speaker C: Well, you have to pay Oracle and gcp, right? [01:04:41] Speaker A: Yeah. Awesome. And Exadata, which is technically Oracle twice, but. [01:04:45] Speaker C: Oh wow. I didn't realize it was separate. Oh wow. [01:04:48] Speaker A: You pay for the exadata hardware. And you pay for the Oracle database. Twice. There you go. That's it, you guys. Another fantastic show. Once again, we will see you next week here in the Cloud pod and see what comes up from our lovely cloud providers. [01:05:06] Speaker B: See you later, guys. [01:05:07] Speaker C: Bye everybody. [01:05:11] Speaker B: And that's all for this week in Cloud. We'd like to thank our sponsor, Archera. Be sure to click the link in our show notes to learn more about their services. While you're at it, head over to our [email protected] where you can subscribe to our newsletter, join our Slack community, send us your feedback, and ask any questions you might have. Thanks for listening and we'll catch you on the next episode.

Show Notes

Titles we almost went with this week:

A big thanks to this week’s sponsor:

We’re sponsorless! Want to get your brand, company, or service in front of a very enthusiastic group of cloud news seekers? You’ve come to the right place! Send us an email or hit us up on our slack channel for more info.

Follow Up

General News

AI is Going Great – or How ML Makes All It’s Money

Cloud Tools

AWS

GCP

Azure

Oracle

Closing

Episode Transcript

Other Episodes

Episode

Google will shutdown The Cloud Pod in 2027 – Ep. 37

Episode 310

310: CI You Later, Manual Testing

Episode 118

118: The Cloud Pod talks LaMDA, which one?