272: AI: Now with JSON Schemas!

[00:00:07] Speaker A: Welcome to the cloud pod, where the forecast is always cloudy. We talk weekly about all things AWs, GCP, and Azure. We are your hosts, Justin, Jonathan, Ryan and Matthew. Episode 272 recorded for the week of August 13, 2024. AI now with JSON Schemas. Good evening, Matt. How's it going? [00:00:28] Speaker B: Like me some JSON. [00:00:29] Speaker A: I do love me some JSON, but you're jumping the lead. We'll get to that in a minute. Okay, we got some other stuff to talk about first. We're missing Jonathan and Ryan because Ryan's still on vacation and Jonathan's not feeling the best. So it's just the two of us this week before I actually go on vacation as well. But first up, Crowdstrike has officially published their final RCA for their outage on the channel file. 291 outage that caused the worldwide meltdown on July 19. Sorry, what day was that? It was July 19. [00:01:01] Speaker B: It was a Friday. It was Thursday night. Depending on where you were in the world. [00:01:06] Speaker A: Yeah, it was. It was weeks ago at this point. Yeah, no, it was. It was the 18th into the 19th. Eleven ish our time. And so basically, you know, they went into more detail than they did even on their preliminary one. We talked about previously, mostly around the details behind their channel file update, what they were doing. And basically their interpreter was expecting 20 values. They passed 21 input parameter fields. And that 21st one was wildcard. And that caused C to freak out and cause your computer to blow screen at death, which is not that much further than what they told us originally. They just gave us some more data about the input values, the sequence of events, how they tested this previously, how they'd done other releases for the channel file with 20 input values and didn't have any problems. It was only when they added this 21st, they had an issue, which is why it goes into a bunch of things around finding and mitigations. Hopefully, I prevent some of these things. And so I won't talk too much about the root cause unless you have any questions. Matt, any thoughts? [00:02:07] Speaker B: No. I read this in brave detail when it came out of what feels like a long time ago. But I guess it was only a few days ago. [00:02:15] Speaker A: It was the 6 August. [00:02:17] Speaker B: Yeah, it's only seven days ago. But it's interesting, I think, that I like that they actually did this, a fairly detailed, in depth and like, kind of explain it to you, which is nice. You know, still a little bit bummed that they don't have as much of a robust QA testing, because this feels like fairly simple. [00:02:39] Speaker A: Yeah, we'll get to the mitigations here in a second, but yeah, because there's a bunch of things there. But I. The only thing I would say this would be a perfect RCA if it included a timeline. Yeah, it lacks a timeline view. [00:02:52] Speaker B: That is one thing that Microsoft is really good at when they do their rcas. [00:02:57] Speaker A: So is Amazon, frankly. They're very good at the timeline of the sequence of events. [00:03:01] Speaker B: Like code was released here. This happened, this happened, this happened down to the minute. I agree. I wish this had it even. This was only released in February, so they could have shown more detail. I would bet if you harass your account rep you probably could, but you get a little bit more. But I think it's good that they kind of did start to break it down. [00:03:24] Speaker A: So to mitigate this from ever happening again, they gave us several mitigations that they're doing, basically six of them. The first one is the number of fields and the IPC template type was not validated at sensor compile time. And to mitigate this, they're basically going to validate the number of input fields and template type at a sensor compile time, which seems pretty obvious like, thanks, I appreciate that. You're going to do a basic bare minimum of QA. A runtime array bounds check was missing from content interpreter input fields on channel file 291. And they're going to add an input array bounds check this content interpreter for rapid response. Channel file 291. [00:04:02] Speaker B: It happens. [00:04:02] Speaker A: Cool, cool. As well as they're going to correct the number of inputs provided by the IPC template type so they don't have this problem so far. That's so great, to be honest. Template type testing should cover a wide variety of matching criteria, which is basically them saying they're going to increase test coverage. And then I think somewhere else in here they talk about adding fuzzing and things like that, which fuzzing should have definitely helped catch this because you would have put more data into the field than you were expecting and that could have potentially caused different things to happen. The content validator contained a logic error. Create additional checks in the content validator, and prevent the creation of problematic channel two nine one files. Seems pretty straightforward. Then template instance validation should expand to include testing with the content interpreter, which this one is the one that bugged. I think most people, when you really read this one, you basically created a kernel application that can pull files in from outside, but you didn't test any of the templates that came in from outside. You only really certified and tested the Falcon sensor itself, which is in the kernel which is kind of a silly model. I'm surprised this didn't burn them many times before. [00:05:10] Speaker B: Yeah, I mean, I also feel like Microsoft should have, like when they got certified to be part of the kernel, they should, you know, have looked at it as from a holistic point of view of like, this thing runs here, but it's gathering data from this broader audience and saw how they did that. But also, this has probably grown over the years since they were certified, so it's possible that these brought new things. [00:05:35] Speaker A: I think they've always had this ability to pull things in. That's how you update this thing, because to do kernel updates every time, you would have to do installs and reboots. And that's not going to fly for a tool like this that has as many updates as it needs to get. But overall, I struggle with these first five because it's sort of like the lack of creativity in testing. It's like either someone didn't think about all the ways it could have failed, and it's hard to do failure mode, event analysis, type nls things that go through that. But I struggle with because it seems like this wasn't fuzzing is not a new concept. Breaking QA processes, where you try to force it to break in different ways, doesn't seem like a new process to me. And so, like, I appreciate the transparency and I appreciate the mitigations they're going to do, but they seem like they should have kind of already been there. [00:06:29] Speaker B: Yeah, that's the one thing that kind of bugs me about this is this feels like up to here, maybe not QA 101, but maybe like 201 or 301. Like, these feel like, you know, for a company of that size with the, you know, market cap they have, these should really be things that they did. And like, like you said, like, I don't know how they weren't burned by some of these things in the past, whether it's pure luck or they caught them, you know, even though number six exists, which we'll talk about in a second, you know, it just kind of amazing that you missed the basic tests. [00:07:09] Speaker A: Well, then, even in the case of like, you know, you're doing C programming in kernel level, part of the reason why rust and things are getting more popular is because it's more kernel safe and memory safe, et cetera. That's why Microsoft's move in that direction and other companies are moving that direction for their kernels. But the reality is there are safe ways to implement C code that does memory checking and does certain validations as you go through the process. That is not 101 C coding. It is a little more advanced C coding, but it exists. And there's been a model and a solution for that kind of testing to basically guarantee inputs and outputs into input variables in C. So it's not like there aren't ways to solve this problem beyond testing. There's just a different level of C coding you need to be able to do that I can't do. I mean, I know I'm not skilled in C anymore at that level that I could code that, but I know it exists. So I know there's reference documentation. I'm sure there's people out there who know it quite well, who write very complicated software for things like jets and airplanes and NASA rockets that require those type of high quality resiliency and redundancy. And again, they're a slightly different scale of human safety that requires those things. But it does exist. And I think again, in a kernel system like this that's so critical to the security of your system and hacker attack target, you should be doing as much as possible to make sure you're writing really hard in C code. [00:08:38] Speaker B: Yeah. [00:08:40] Speaker A: So the last one, which is one that I know was my most annoyance out of this whole thing, is template instances should have a stage deployment. Each template instance should be deployed in a stage rollout, and the content configuration has been updated with additional deployment layers and acceptance checks. Basically they're saying, we didn't test these template instances, we just sent them out to production live, do it in produced, which is great. Now they're now adding in the steps where you can basically test it in their test environment, so that if they blue screen their test environment, we won't care because that code won't actually get out there. And as well as they're going to be working on providing additional canary testing so they don't actually roll out to 8.5 million windows boxes at one time. And they're doing that by giving more control to us, the customers, which apparently if you read people on Reddit, people have been asking for for a while, but basically they'll be giving you different models to support the rapid response content. And you can choose if you want it instantaneously, if you want it within a few hours or days or whatever that timeframe is going to be. They didn't give exact details what that's going to look like, but I appreciate the flexibility now, but I feel like they were forced to do it not because they wanted to do it. [00:09:46] Speaker B: Yeah, this is the one that really bugs me. The most of everything is pretty much we deployed in prod which my security, my compliance. Many different departments I have would lose their mind if I did this at my day job then essentially deployed it directly out. No testing whatsoever. No, let's roll out to a small group at once from an it. This is a tool that it departments use. 101 of it is don't deploy it to the entire fleet. Day one, because if it breaks, you have now broken everyone. Like roll it out in your small pilot group, then next level and then to everyone or however many rings or deployment groups or sets you want to have. Most companies don't play windows updates to everyone. Day one. They wait a week and then make sure that it doesn't break. Break everything. And the fact that they didn't do this is terrifying. And then even worse about this is that you thought you could have control because you could do agent and stuff like that. Minus one in the console. So I know a lot of people thought that they were safe from something like this because they had it set to minus one, minus two. And they weren't because it wasn't fully transparent, that this wasn't even enabled. So this is something that, you know, I know a few of my friends that were affected by this is they're implementing ASAP because they don't want to be stuck in the same thing again. [00:11:24] Speaker A: Yeah. As soon as it's available. I don't know if it's actually available quite yet, but it's coming very soon is what they've basically promised. [00:11:31] Speaker B: It's pieces of it from what I understand. So like you could set up like a whole other system for it, like, you know, differences around there. But I don't think the whole setup is yet, which if they were, did set it all up this quickly. One, it wasn't that difficult, so why don't you do it before? And two, either that or you rushed down and I really hope you didn't rush out something like this right after you just took down the world. [00:11:56] Speaker A: Exactly. I mean, it is interesting. You know, the other thing I guess that you mentioned dates. Their mitigations don't have any dates on them of when they're going to be done or implemented, which is our thing in addition to a timeline. Would be nice to see in this process. [00:12:11] Speaker B: I thought you saw. No, some of them do, some of them don't. So like number four, the logic error fix will be in production by August 19. Some of them say they were already in there. The release was generally available on August 9. Number two, some of them are in there. Some of them are nothing. [00:12:31] Speaker A: Yeah, you're right. Some of them do have it. [00:12:33] Speaker B: It's just buried in the words. So good luck. [00:12:36] Speaker A: Then. The last thing they're doing, which they didn't promise to provide the results of this, but they're doing two different independent third party reviews of their Falcon sensor code for both security and quality assurance, and those independent reviews of the end to end quality process for development through deployment. Both vendors have started the reviews that immediate folks on the July 19 impacted code in process number one. I don't know why you need two vendors to do that, unless you're hoping they're going to give you different answers. But to focus on the July 19 outage, that's fine. That outage already happened. It's now hindsight 2020. I know how you messed up, but you really need someone, again who's creative, who can think through all the different scenarios of failure that could occur. And so that's just a challenge that you need to solve and address. I hope they release the independent third party review, since you said you're going to do it. But I really want, again, look at their entire SDLC process, their entire end to end, and do they have the right standards? Do they have the right coding practices? Do they have the right things that make sense in a product of the sensitivity? [00:13:45] Speaker B: Yeah, like you said, sharing the reports would be interesting. Even a redacted copy of, I think would be good for, you know, customer basis. You know, people trusting the platform, customer trust, you know. So I think in that way, I don't think you're going to see the full report ever. The other piece that I think you touched on a little bit is the full quality assurance processes, you know, is like, who are these vendors? Are they actually vendors that, you know, know more of the QA side? Are they vendors that know more of the software development side and where they situate? And that's the only reason why I think of having two is if they hired one company that focuses on a and more of a company that focuses on b, and maybe that would give them a little bit more of a full view. But, yeah, the only other piece I'd be is what's the timeline for this? This is still running in over 8 million production environments. Let's go over that right now. What is the next step? Like, what is the timeline that they're going to get the report? And then are they going to say, okay, here's the report, here's the findings, here's our timeline. To remediate like they provided, or is it just going to be like, cool? We PR said that we're doing this thing and we're never actually going to tell anyone about it. [00:15:02] Speaker A: Well, we'll find out. These are all questions that I'm sure you know, we'll get answered in the next few months. And I still think we're a long ways away from this being fully put to bed. I still think going to see changes in the way people treat vendor risk. I think the way that disaster recovery is handled at companies, I think this is one of those events that it was bad, it could have been worse. What do we learn from it and how do we improve as industry as a whole? Yeah, well, in ways to not improve. Microsoft has joined CrowdStrike in blaming some of the responsibility back to Delta for their recovery from the CrowdStrike instance. You know, basically they felt they had to respond to the public comments that Delta Airlines CEO and others have been making, blaming both CrowdStrike and Microsoft for their recent it woes. And Microsoft says via their lawyer, our preliminary reviews, just that Delta, unlike its competitors, apparently has not modernized its it infrastructure, either for the benefit of its customers or for its pilots and flight attendants. From Mark Scheffel, from law firm to Cherto representing Microsoft, all I can say is it's going to get ugly before all this gets settled. That's where your opening vision is at. [00:16:14] Speaker B: Yeah, if you, I think, I don't know if it's in this article or somewhere else I read it was like Microsoft had reached out multiple times and had said, do you want help? And Delta was like, no, I got this. You know, and clearly they didn't, but. [00:16:29] Speaker A: They, well, I think we talked about this maybe a little bit last week. The struggle with offering to send someone on site to help you is you can't vet them that quickly. You also have obligation to your shareholders, you have obligations to your security controls and your SoC and ISO and all the things that you're doing to allow some strangers into your network and then give them access required to fix this issue, which in some cases required you to provide local encryption keys and local administrator passwords. Like you're basically saying, here's the keys. Because we're in a, everything's in crisis and we're going to throw security out the window to allow these people to come in and touch my environment to get us back up and running. I can see the argument both ways. I get it. Things are on fire. You're trying to fix it. But also this is not a great scenario if you allowed it to happen either. And I hope that most companies who did get help from crowdstrike basically were getting it from or advice and consultative help versus actually getting people on site unless they don't have a high security requirement. [00:17:35] Speaker B: Yeah, I was thinking of it when they offer to help more as the, like, use of the second part, the consultative slash. You know, get on a call and walk through things, you know, especially with complex Microsoft environments. If you're running like SQL clusters and stuff like that, that's where I was thinking like, hey, this has crashed and you can't get it up. Let me get the Microsoft SQL SME or the whatever SME to come and help, you know, whether it's not to actually be hands on keyboard, but to be that consultative person. That's where I was thinking of it more as than like actually sending somebody or giving them the low collab. [00:18:12] Speaker A: Now, if, if they were calling and saying, hey, we'd like to have a conversation with you, we have some ways that you can recover. And then Delta wasn't even willing to listen to that. That is a little wrong because there was things that crowdstrike was able to offer to companies. Basically they produce a release that you could opt into to basically. And during one of the reboots, it would allow it to grab and invalidate that channel file. Before the full kernel boot up happened. There were some things that you could do that would help you potentially recover faster. Those are things that I would hope Delta did get access to. But if they were just refuse to talk to you because we're mad at you and we're going to take our ball and go home, then I could see kind of the argument about like, well, they didn't even take our help, but, you know, if it's just the on site people, I just, I just don't see that as that egregious. [00:19:01] Speaker B: Yeah, no, I get it. I mean, I don't know that we're going to know the full details of it, but I think it's in. [00:19:08] Speaker A: Oh, I bet we're going to because it's going to end up in court. [00:19:10] Speaker B: Oh, yes. You'll get all the depositions and. Yeah, yeah, you're. Yeah, all that will be covered under discovery and things like that. So, yeah, it will be public. You're right. [00:19:21] Speaker A: Yeah. So unless they all decide to settle, which would be the right thing to do, I think, in this particular case. But shots have been fired. You called them their antiquated it infrastructure, and Delta called out crowdstrike. And Microsoft pretty hard as well. So we'll see. Maybe after people have had some distance from this incident, they'll feel a little better about it. But there are also the, the court or the Senate hearings coming up very soon as well. I think they're scheduled for September where they're pulling in the Crowdstrike CEO, and so who knows what's going to be said there. We'll keep you posted here at the glad pod. Moving on to AI is going great. Anthropic is poking the hackers this week with Defcon going on, which is interesting timing, offering up to $15,000 for jailbreaks that bypass the anthropic safeguard and elicit prohibited content from the Claude chatbots. By inviting outside researchers to test the models, Anthropic is hopeful to identify problems the company couldn't find on its own, and they're hoping to attract hacker groups that post jailbreaks on Twitter to recruit for the program. So if you know how to jailbreak claude, you can go make some money. [00:20:26] Speaker B: I like this. I think it's like, just like a bug bounty program. I don't see why an AI system should be any different than, hey, I have a bug bounty up for my website and go at it so we can fix it. So I think that this is great to see, and I weirdly hope they give out lots of money to people to make the system be better overall. [00:20:49] Speaker A: Agreed. All right, if you are looking to learn a little bit about databricks and generative AI capabilities, and you're looking for an opportunity to get your company involved, Databricks is hosting a worldwide generative AI World cup hackathon, inviting participants to develop an innovative genai application that solves a real world problem. Participants will compete for a pool of over $50,000 in cash prizes, trophies, and passes for the data and AI summit 2025 put on by Databricks, as well as you'll get material to help you scale up on generative AI as part of the process and a $500 databricks basically credit. To participate, you must meet the eligibility requirements, which is that you must hold a data or AI role in your register with a corporate email address. Your team can only be two to four members. Databricks, staff, partners, and consultants and students are not eligible, and you must be over 20 years old. And you can start now, the deadline being October 18 at 05:00 p.m. to send in your two to four minute video with your generative AI application. [00:21:49] Speaker B: I think hackathons are fun. Good ways to learn things, good ways to get people interested. The only reason, the only thing I question here is why are students not eligible? [00:21:59] Speaker A: I don't understand the 20 year old age requirement either. [00:22:02] Speaker B: I assume there's some countries that's kind of where my head was at. [00:22:06] Speaker A: Yeah, they do have an eligible country list. You do have to check that as well. Well, when you're signing up to make sure you're allowed to do it. [00:22:12] Speaker B: But why not? Students? [00:22:14] Speaker A: Yeah, well, again, why is that to be a. It sounds like they only. They only want you to participate if you're a company, which is sort of like, is this a sales ploy to get leads? Which is sort of weird, but again, like, you know, there's definitely some questions about how they're trying to do it that I don't fully get it. But, you know, so far 42 people have signed up, so we'll see. [00:22:37] Speaker B: It's not too bad. [00:22:39] Speaker A: Yeah, you have to be in mostly major countries like Australia, Canada, France, Germany. You know, none of the smaller countries are listed here in the list of authorized companies or countries. [00:22:49] Speaker B: I assume there's like, tax reasons, like when you give money out that, like, so maybe they have to be registered in certain countries. Like, that's kind of where I was thinking a little bit. [00:22:58] Speaker A: Yeah. It says a purchase or payment will not increase your chances of winning, by the way. [00:23:03] Speaker B: Oh, by the way, New York and Florida are not eligible states. [00:23:08] Speaker A: Oh, interesting. [00:23:09] Speaker B: In the US. Weird. [00:23:11] Speaker A: Wonder where that is. [00:23:13] Speaker B: Yeah, I have a lot more questions now, like, more curious about the rules than anything. [00:23:21] Speaker A: All right, so if you're interested in that, and you're interested in databricks, there's opportunity. I'll give you a lunch of free training, which might be worth just to get that. Even if you don't actually plan to submit an id at the end of it, they give you a bunch of access to free training, plus that $500 gift credit for using databricks, which you will burn very quickly if using databricks or any of these solutions. So do be careful. [00:23:41] Speaker B: Well, use your company credit card and just make sure that nobody, your CFO, doesn't get the bill for a few months. [00:23:48] Speaker A: Yeah, don't do that if you work for Matt or me, actually. AWS announces private IPV six addressing for vpcs and subnets. This is with the VPC IPam manager on AWS. Private IPV six addresses contain a form of unique local IPV six unicast addresses, or ULA's and global unicast addresses or guas, and can only be used for private access. AWS does not advertise IPv six addresses to the Internet. Within IPAM, customers can configure those IPV six addresses in a private scope, provision the EULA and GUA and use them to create vpcs and subnets for private access. Customers can use these IPV six addresses to boost security and ensure compliance, as they can demonstrate that their resources with private IPV six addresses are nothing Internet accessible via quick audit. [00:24:32] Speaker B: I love it. I love that they're actually making IPV six be simple to deploy the same way as the 10.8 and the, was it 192168? And the other subnets that are private. I just don't have a strong desire to deal with IPV six nuances still in life. So I don't foresee myself deploying this, but if you are a bleeding edge company or, and, or you want to launch lots and lots of instances and or nic cards in the same subnet, the same thing, go for it. It's a great feature they're adding. [00:25:10] Speaker A: Yeah, well, I mean if you're using, if you're selling something to mobile devices in particular, the need for IPV six is pretty big in mobile. And so like, I mean the way you saw that mostly is you use a dual, dual zone load balancer to provide an IPV six address, the load balancer, then you just do IPV four behind there. But if you're doing some type of end to end situation where you need to be IPV six all the way back, it's possible. But again, you have a pretty big zone if you need IPV six, but it's good to have it. I appreciate that they're getting more and more support for IPV six since they started charging you for IPV four addresses. [00:25:44] Speaker B: Yeah, well not in your subnet, but yeah, yeah. [00:25:49] Speaker A: Still, I still appreciate it. Amazon EFs in March increased from 10gb to 20gb and they said that's not fast enough for the biggest workloads out there. So now they're further improving EFS performance to 30 gigabits per second, all for the same basic pricing. EFS is a simple, fully elastic and provision for experience to support throughput intensive AI machine learning workloads for model training, inference, financial analytics and genomic data analysis. [00:26:17] Speaker B: Better speed, always better. Yep, faster speed, always better. Yeah. I mean, no extra costs. It's what I do love about the way AWS does a lot of these things is they just tack it right in and you never know the difference and you don't pay for it. It just, you naturally get it, which is nice. [00:26:34] Speaker A: Yeah. Well I would say you will pay for it if you use the guaranteed provisioned IOP's for EFS, but if you're just using the burstable up to 30 gigs for free, which is nice. [00:26:44] Speaker B: Yeah. [00:26:45] Speaker A: So a feature that came out, I don't know, maybe six, eight months ago, I don't remember. It's Internet time. It takes forever. The Amazon Cloud watch Internet monitor, which basically was an internal way to see how the health of the Internet is affecting your applications. At the time I was not plussed with it. They've redone it. They've basically given you a new updated console experience, including new features for visualizing configuration changes that can help you reduce latency for your application. And with the refresh dashboard, the Internet monitor console analysts you easily find take advantage of Internet monitors breadth of capabilities and they have added quite a bit. I checked it out before the show and yeah, you can create your monitors, you can have put the plugin right into the, into the graphic of the world they have here and you can see that right now in Brazil. Rlnet Internet ASN 268671 is having an availability issue which could be affecting your app if your customer is having to be on that provider. So this could be very helpful for your support teams, very helpful for troubleshooting latency problems that your customers may be inferring on your SaaS app. [00:27:48] Speaker B: All I know is you just called out some poor ISP and yeah, sorry about that, wherever they were. So, you know, it's very nice of you. [00:27:55] Speaker A: I'm sure they're doing fine. Hug ops. [00:27:57] Speaker B: Yeah, someone's having a bad day, just calls fine, don't worry about it. No, I haven't actually played with this. I like the concept of this originally I played with it a little bit when it first came out. I haven't played with the new one yet, so I'll have to go actually log in and play with it. [00:28:14] Speaker A: AWS backup is announcing the general availability of logically air gapped vaults, a new type of AWS backup vault that allows secure sharing of backups across accounts and organizations and also supports direct restore to help recovery times from a data loss event. Logically air gap vault stores immutable backup copies that are locked by default and isolated with encryption using AWS's own encryption keys. You started with logically air gapped vaults using the AWS backup console API and CLI target backups to logically air gap vault by specifying it as a copy destination. Your backup plan and share the vault for recovery or restore testing with other accounts using AWS resource access manager. So I know I've done this before where you create a separate organization and you create a backup s three bucket there and you send all your backups to it. It's nice to see this natively built right into AWS backup to allow you to do basically the same thing. And the walls that can be disconnected from the network so it's not online all the time waiting to be compromised by a ransomware. So overall, this is a nice common shared third doctor type backup solution. [00:29:14] Speaker B: Yeah, like you said, I've done the same to kind of hack it together and make it work. It got rougher once you tried to do like EBS and EFS backups. I love that it's actually managed for un ten. I'm surprised that day one I thought I looked it wasn't available in govcloud because so many government restrictions require these things. So I have a feeling this would be moving to gov cloud rather quickly because this was something that is a requirement for CMMC and a few other things. [00:29:46] Speaker A: Yeah, typically Govcloud stuff comes within a. [00:29:49] Speaker B: Couple of months, about nine to twelve months. Yeah, it has to go through all the extra certifications and everything. [00:29:55] Speaker A: Yes, the lengthy, lengthy Fedramp certification process. Moving on to GCP. This is a feature that I would love to see actually in all the clouds. The Cloud SQL Studio for MySQL Postgres SQL and SQL Server is a new, generally available in console lightweight tool to query your databases directly from the console without having to install things like SQL agent or SQL command studio. Cloud SQL gives you a consistent, intuitive user interface for all your databases regardless of the engine. In addition to ease of access, you get the ability to quickly and easily create, edit and manage your database using an AI assistant that helps you write queries with natural language. Cloud SQL Studio can help make your database administration skills much better and take you to the next level. [00:30:36] Speaker B: I actually thought AWS has this with Aurora IAm queries where you can actually query it through the console in your IAM user. [00:30:45] Speaker A: But isn't it only tied to Aurora where this is basically any MySQl postgres or SQL running through cloud SQL? You can use this on which you're correct. [00:30:54] Speaker B: Yeah, I think it's specific for Aurora, but I know I definitely did do that in the past. And like you said, this will save a lot of time and effort if they can do this for all of them because so often somebody needs to just run a select or run, run a query to see what's in there. So setting up a server jump box, giving access, MFA, all that jazz all the way across this pain. So I definitely would love for them to do this across the board more. [00:31:21] Speaker A: Well, that's one of the things that I quite often spin up a jump box for like jump box and all the SQL agent that I need and then just hop on and do things. Yeah, I just do a quick little google here. People say you can do it with things like Athena with like the JDBC drivers. That's just not as clean in my opinion. [00:31:40] Speaker B: No, no, this is nice and clean right in there. I didn't fully realize what this feature was until obviously you were reading it and I was like, oh that's nice. The other nice feature here is the AI assistant, which as I joke in my day job never let me touch a SQL server. I might actually be able to do better selects and anything in SQL with AI assistant. So that to me might be a good place to actually have AI, especially if it could understand your schemas and whatnot a little bit more and give you a little bit of natural language query for your own databases. But that's probably a later on feature that's going to come. [00:32:17] Speaker A: Yeah, let the rag models get built for that. So yours is SQL Server. You can't write a SQL server. Mine is LVM. No one lets me touch LVM configurations because I kill a lot of Linux boxes that way. [00:32:29] Speaker B: Hmm. No, I should never touch any SQL whether it is Microsoft postgres, MySQl. I can do the extremely basic things of it. But besides that, you know, I joke that as far as my SQL knowledge got, it was select from and where. I know there's joins and inners and outers but like no, select star from here. Maybe I'll get fancy and throw aware in there, but like it's not really good for anyone. [00:33:00] Speaker A: Yeah, okay, well that's fair. I've definitely caused some bad SQL queries in my past, but I've learned pretty well to not make really dumb mistakes like multiple left inner joins and things like that at this point. So I have. But LVM, yeah, corrupted many boot volumes. [00:33:17] Speaker B: I haven't dealt with LVM in years though. But it wasn't that bad. [00:33:21] Speaker A: I mean now I, now I just build a new box, just reattach the EBS the way I need to so I don't deal with it as much as I did back in the day. But yeah, that was a joke at the company. I was like, yeah, don't let Justin touch lvm on the servers because I blew up a couple boxes. I can do it at home now. I haven't had a problem in a while, but still, just for sanity's sake, just don't let me touch them. [00:33:40] Speaker B: I was just, I think last time I dealt with lvms was I had a customer that was spanning EBS volumes to get better throughput way back in the day. And that was the last time I think I did it. And that was on AWS like 1012 years ago. [00:33:55] Speaker A: Yeah, yeah. It's been quite a while since I've had to do it on anything on cloud. I have some stuff in my home lab that I've had to do it on and not corrupted it, so it's possible I can do it. [00:34:05] Speaker B: Yeah. [00:34:06] Speaker A: Don't, don't. [00:34:07] Speaker B: You got to know your limits. [00:34:08] Speaker A: Exactly. [00:34:09] Speaker B: That is one of them. [00:34:10] Speaker A: Yep. All right. Introducing bigquery continuous queries for up to the minute insights data analytics and engineers are increasingly demanding expanded real time capabilities to manage continuous data streams for both input and output data. To address this challenge for customers, Google has transformed Bigquery into a real time, event driven analytical platform. Now they're launching a bigquery continuous queries in preview. Bigquery continuous queries answers the challenge of cost and complexity of true real time data analysis. Historically, real time meant analyzing data that was minutes or even hours old. Both the demands for customer engagement, decision making and AI driven automation is now necessary to get this data in seconds. Bigquery continuous queries can execute SQL statements that can process, analyze and transform data as new events arrive into bigquery, ensuring your insights are always. To date, the needed integration with Google Cloud's ecosystem unlocks even more potential as you can harness the power of vertex, AI and Gemini to perform ML interference on incoming data in real time. Or perhaps you want to replicate the results of continuous query to pub sub topics, bigtable instances, or even bigquery tables for further processing analysis. There's several use cases that this unlocks for you. Things like simplifying real time pipelines for Express Express, complex real time data transformations and analysis using the familiar language of SQL remove the need for additional technologies or specialized programming skills. Unlocking your real time AI use cases via Vertex and Gemini and streamlining reverse ETL or bigquery, which allows you to seamlessly move your data back and forth through different systems. There's a quote here from Anthony Savio, data warehouse engineering lead at Bayer Pharmaceutical at Bayer we are under more pressure to deliver real time analytics, which has historically proven difficult. And now that we've had an opportunity to buy bigquery continuous queries, we are incredibly excited about the future possibilities this capability will unlock from real time integration of ERP, CRM IoT data to real time monitoring and alerting use cases. We believe continuous queries will be a game changer that will significantly expand the types of business challenges we can address within our data warehouse. [00:36:05] Speaker B: I love the concept here, especially the real time queries. Like as data comes in, it automatically runs queries to gather stuff. Obviously they had to throw the AI in there because why? Yeah, it has to be in there. If you have a news I think we should actually have like a running count maybe for like next year of number of news stories that we could talk about that do not have AI, do the inverse, see what's going to be, but also the reverse ETL. I think that's kind of a nice feature too, where so much times people try to go one direction, but there are times that you need to invert it. So I think having that built in and that whole process automated should be a very effective for companies. [00:36:48] Speaker A: Yeah, closest I got into working with this type of thing as KSQL, which is Kafka SQL. Basically as eventing goes through that matches your query results through KSQL, you can pull it out immediately basically into tables and into other real time insights. It makes sense that this would be something you'd want to build natively into bigquery, especially considering the use cases that you have on that big data. So yeah, I'm glad to see this. OpenAI is announcing the latest and greatest model for Azure, which is the optimally named with extremely sexy name GPT 4020-2486 whoo. [00:37:25] Speaker B: The dash six really set it off. [00:37:27] Speaker A: Really set it off. Brings innovative features designed to elevate the developer experience on Azure. And the developer experience they are improving is one that I can get behind and that is structured outputs via JSON schemas. You can now have it output the data from the chat GPT into JSON schema that you provided the exact JSON schema for. Or a more accurate tool output which is strict mode. Limited version lets developers define specific function signatures for tool use supported by all models that support function calling directly. Yeah, so thats great because one thing I sometimes mess up is JSON schemas. So I appreciate chat DPT being able to give me a structurally correct JSON schema that I've defined with my data set that allows me to move it quickly to other systems that may need that data for input from JSON. [00:38:13] Speaker B: Yeah, I mean the JSON feature here, while it seems so simple, is so nice. They have everything to structure in that way. I played with some of the APIs of OpenAI and some of these things and it gets a little messy at times. So hopefully this becomes a nice standard that we're able to have. [00:38:30] Speaker A: It's sort of interesting to me that it's basically a fine tuning of GPT 4.0 that they're now saying is a new release, but it's really just a fine tuning to how it specializes in doing JSON. But I'm not mad at, it's just sort of an interesting way they announced it. [00:38:46] Speaker B: So they essentially do the GPT 4.0 and then they'll have these revisions and they keep these revisions for only so long. So they essentially do the Lts kind of versions which will probably be like the straight, and then these minor versions in between. [00:39:01] Speaker A: Interesting. So then they eventually get rolled into whatever GPT 5.0 becomes and then embedded into it natively at that point. Okay. [00:39:07] Speaker B: Or like 20240 nine four might be something they keep for longer. [00:39:14] Speaker A: I see, got it. A little idiosyncrasy of chat GPT on Azure AI that I did not know. [00:39:21] Speaker B: Clearly dealt with the date trap a little bit. [00:39:25] Speaker A: Azure data box, which is basically their answer to snow, snowmobile, no snowmobiles, they just continued, yeah, but now snowball is a snowball. It's basically Azure data boxes, their version of a snowball. If you're familiar with Amazon's for offline data transfer solutions that help you send petabytes of data into Azure storage in a quick, inexpensive and reliable manner. Secure data transfer is accelerated by hardware transfer devices that enable offline data ingestion and Azure several new enhancements, including general availability of self encrypted drives for Azure data box disk skus that enables fast transfers on Linux systems. Support for data just into multiple blobs access tiers in a single order preview of cross region data transfers for seamless data ingest from source country or region to select Azure destinations in different countries or regions, which I guess assume is you send the box back to them and they just attach their high speed network. Then support for Azure storage mover for online catch up data copy of any changes to active workloads that may have been generated post offline migration with Azure data box, as well as the now achieved HIPAA and BAA certification, as well as PCI three deciseconds and PCI DSS certifications, allowing you to send credit card data through the Azure data box. [00:40:36] Speaker B: Exactly what I want. A large sum of credit card data all being mailed through the mail. What could possibly go wrong? [00:40:44] Speaker A: Hey, it's self encrypted, though. [00:40:45] Speaker B: So I do like here the data encryption of multiple blob access tiers in a single order. It's been a long time since I used the snowball, and I've always wanted to play with the snow cone. I just never did. But at one point, you could only dump it into, like, EBS, and then from there, they added s three, and it was always, like, one account, like, one location. So. And then, especially when you're moving data, especially if this is capable of doing, like, up to petabytes, you know, there's probably a lot of archive data, so you probably want to directly dump it into archive storage or other tiers. So it is nice that they can kind of just do it all in one order, not having to, like, okay, this snowball or this data box is for archive. This one's for hot. This one's for warm. You know, get it done one time with all the data and move on. [00:41:40] Speaker A: Yeah, we, um. When the snow cone first came out, we ordered one here at the cloud pod, and we played with it for a week or two. We had a container running on it. You know, it was slow as molasses, but it works pretty well. Like, it's a cute little box and solves a need when you have that requirement. [00:41:54] Speaker B: Didn't they send a snow cone into outer space, too? [00:41:58] Speaker A: Uh, I think it did, yeah. [00:42:00] Speaker B: Yep. AwS systems, snow cone to space. [00:42:05] Speaker A: They travel the globe and the space. Well, Azure is going out on tour. Not a rock tour, but an AI tour. Of course, this is apparently the second year, and I apologize here at the cloud pod, that we did not know about the Microsoft AI tour last year to tell you about this, but the AI tour will visit 60 cities starting September 24, offering a free one day in person experience on AI. Thought leadership sessions will help build AI skills. Hands on workshops will be offered, and connections will be made with other attendees who are practicing AI. I'm disappointed that they have not yet announced anything here on the west coast. For Jonathan Ryan and I go to. But, Matt, choose your poison. New Yorker, Boston. You can let us know all about the AI tour. [00:42:46] Speaker B: I will try to attend one of them, and I'll give a recap. Worst case, I will do New York. Maybe I will try to do Boston. Or I'll find a reason to have my company send me to, I don't know, insert other country here. There's Paris, there's Toronto. Yeah. [00:43:02] Speaker A: I mean, you could go international for sure. That'd be Paris just finished up the Olympics. You can go check out, you know, all of that, you know, you're enticed, you know. Yeah. If I were to choose one, probably to Sydney, it'd be kind of one I'm sort of interested in. [00:43:15] Speaker B: I've done Sydney before and, you know, it was a great city, don't get me wrong, but I feel like I would try Seoul, somewhere completely different, or Tokyo, like, just give it good two. [00:43:24] Speaker A: Yeah. [00:43:24] Speaker B: Somewhere I haven't been. [00:43:26] Speaker A: I'll keep an eye on this site because I assume, you know, there's only 612. Oh, sorry, 1414 cities of the 60 are launched so far, so I will keep an eye on that. I assume something will happen on the west coast or Chicago or other places I would be more interested in going to than Boston or New York. No offense. I would love to come see you, Matt. [00:43:47] Speaker B: I don't blame you. [00:43:47] Speaker A: It's cold there in the winter and Chicago is not. I mean, I wouldn't go to Chicago either if I didn't have to. Again, depends on time of year. Yeah. And then the last one is customer managed planned failover for Azure storage. This is in preview. Azure is giving you the ability to do a planned failover of your Azure storage account. Over the past few years, Azure Stores has offered you an unplanned failover as doctor solution for geo ridden storage accounts. This allowed you to meet business requirements for doctor testing and compliance. But when you did that, you then had to reset up the failover back to the prior region. You had to fail back, and there's a lot of time and cost associated with setting up and failing back and forth. With plan failover, you now get the same benefits while introducing additional benefits of not having to swap the geo primary and secondary regions. And so you can basically flip over from west to east coast and then flap, flip right back over to the west coast without having to reset up all, all of your systems and settings, saving you time and money. [00:44:46] Speaker B: I am excited for this, partially because I have to do a yearly doctor test and this was something that last year we did. We had to copy, have it copy all the data and then convert it back and then flip it back. So it just hopefully gets rid of some of the monotony of the doctor process. And honestly, we forgot to do a small piece of it at the end. So, you know, it will hopefully simplify this and make doctor testing easier and better so you actually trust it more because that's the worst is you end up in a situation where you're like, this should work, but we don't ever fully test it because we can't control these pieces of it. So the plan failover, you're still relying on Microsoft on the backend, but remaining in geo replica so you don't have to do all these extra steps is great. Yeah. [00:45:36] Speaker A: I'm actually sure this wasn't thought of when they first came up with the idea to do this. It seems like the idea of having to reset it up back in reverse, that is a pain. I guess if you look at Legacy data center models, that's how you have to do it in your legacy data center. Maybe I'm just asking too much of my cloud provider that they would think that through initially. [00:45:58] Speaker B: I still think of here's the MVP, so you can have your data and copied and breaks the replication. So I get why this wasn't released day one, but it feels like very nice to have, I don't know, s three replications done completely differently where it's more of trigger based pushing. Does GCP have anything kind of like this? [00:46:23] Speaker A: I don't know off the top of my head if they do or not. They don't have this either. The storage account, it's a unique construct to Azure. I think they have similar opportunities to do it, but it's not exactly the same. [00:46:36] Speaker B: Storage accounts are just blob. I mean, at a simplest form it's a combination of. It's a location that can have both blob and file share. So in AWS terms, s three and FSX kind of integrated all into one. And this only works though, for some of the types, though I think if you got detailed into it, I thought when I was reading it doesn't work for all the file types. Yeah, it works for blob gen two, table file and queue. So I don't know if it works for everything in its entirety. [00:47:14] Speaker A: Got it. Let's go. A good future direction that we're heading for. [00:47:19] Speaker B: Sure. [00:47:22] Speaker A: Well, I have found one Oracle story for you this week, which is that if you are in Saudi Arabia and you've been desperately looking for a doctor option, Oracle's now grown into their second region in Saudi Arabia. The new Riyadh cloud region will help public and private sector orders migrate all types of workloads to OCI. The quote here from Richard Smith, executive vice president internal manager of EMEA Cloudwater Oracle. With the rapid expansion of our cloud footprint in Saudi Arabia, Oracle is committed to helping the country achieve its goal of developing one of the strongest digital economies in the world. As part of our wider investment in cloud ability in Saudi Arabia, the Oracle Cloud Riyadh region will help accelerate adoption of cloud and AI technologies to boost innovation across all sectors of the saudi economy, while helping organizations address local data hosting requirements, which have been written to be very strenuous and favorable to Saudi, which is why all these cloud providers are building out there. If you'd like to sell to companies. [00:48:13] Speaker B: Like Saudi Aramco, they've shipped another truck to Saudi Arabia. [00:48:19] Speaker A: You just got to send a couple trucks with some satellite Internet packaging from Starlink and you're good to go. [00:48:26] Speaker B: I mean, here's a place I would not want to build a data center, you know, just due to cooling costs. But, you know, Oracle is doing sort. [00:48:36] Speaker A: Of on the list of like, I don't want to build one in Arizona, UAE, anywhere really in the heat, in the hot zone of that part of the world. But that's just sort of the way that these things, data sovereignty requires it exists. [00:48:50] Speaker B: Yeah, I know that Azure had a power a cooling issue a few weeks ago also in UAE, too. [00:48:56] Speaker A: Did they ever give an RCI on that, I assume? [00:49:01] Speaker B: I don't know. I'll have to look that one up. It's one of those things that didn't fully affect me that much, so I had the redundancy. So when one zone went down, other zones picked it up and I stopped caring. It's the nice part about building things correctly at times is you have the ha, you accept the cloud, you design for failure, and you move on in life. [00:49:21] Speaker A: Yeah, it's true. Nice not to worry about some of these things. Yeah, well, good. That's another fantastic week here in the cloud. Matt, any big plans for you and Jonathan next week? Why? I'm out. [00:49:33] Speaker B: I'm working on one or two guest hosts, so we'll see if we can get somebody to come. That way you're not just hearing myself and Jonathan talk? Cause I think the cloud pod and doing it is a lot more fun and I think that people get a lot more value out with more people here. So hopefully we can get at least three of us on the call next week and do it together. [00:49:59] Speaker A: You should ping maybe a former co host, maybe he's available for. [00:50:03] Speaker B: I thought about it too. That was the other idea I had. So we'll see what I actually remember to do and I have time to do and then go from there. [00:50:11] Speaker A: Perfect. Well, you guys have a great show next week. I will miss you. Of course. And we'll listen to your episode when I get back. So see you in two weeks. [00:50:21] Speaker B: Bye, everyone. [00:50:26] Speaker A: And that is the week in cloud check out our website, the home of the cloud pod, where you can join our newsletter slack team. Send feedback or ask [email protected]. or tweet us with a hashtag, pound the clap pod.

Show Notes

Titles we almost went with this week:

A big thanks to this week’s sponsor:

We’re sponsorless! Want to get your brand, company, or service in front of a very enthusiastic group of cloud news seekers? You’ve come to the right place! Send us an email or hit us up on our slack channel for more info.

Follow Up

AI Is Going Great – Or How ML Makes All It’s Money

AWS

GCP

Azure

OCI

Closing

Episode Transcript

Other Episodes

Episode 119

119: Oracle announces something amazing, The Cloud Pod worldview shook

Episode

Episode 12: Spotinst has yet to announce partnership with the cloud pod

Episode

The Cloud Pod wins second place for the Jedi contract – Ep 45