278: Azure is on a Bender: Bite my Shiny Metal FXv2-series VMs

[00:00:07] Speaker A: Welcome to the cloud pod, where the forecast is always cloudy. We talk weekly about all things AWs, GCP and Azure. [00:00:14] Speaker B: We are your hosts, Justin, Jonathan, Ryan, and Matthew. [00:00:18] Speaker A: Episode 278 recorded for the week of October 8, 2024. Azure is on the Bender by my shiny metal FX V two series, Vmsheen. Hey guys, how's it going today? [00:00:29] Speaker C: Hello. [00:00:30] Speaker B: Going well? [00:00:32] Speaker A: Awesome. Justin is not here. He's on a boat in the, I assume the Atlantic, actually. Hopefully he's not too affected by the hurricane. Or he's back by then. [00:00:42] Speaker B: Yeah, might be a way to do this podcast with just the three of us forever. [00:00:46] Speaker C: It's not going to end well for our listeners. They're just all going to be Futurama references from here on out. [00:00:53] Speaker A: It could be worse. [00:00:55] Speaker C: Yeah, touche. [00:00:59] Speaker A: All right, so no general news today, so we're going to head right into AI is going great, or AI is how ML makes money. So first off, introducing vision to the fine tuning API OpenAI has announced the integration of vision capabilities into its fine tuning API, allowing developers to enhance the GPT 4.0 model to analyze and interpret images alongside text and audio inputs. The update broadens the scope of applications for AI, enabling more multimodal interactions. The fine tuning API now supports image inputs, which means developers can train models to understand and generate content based on visual data in conjunction with text and audio. After October 31, 2024, training for fine tuning will be dollar 25 per million tokens, with inference priced at 375 per million input tokens and $15 per million output tokens. Images are tokenized based on size before pricing. The introduction of prompt caching and other efficiency measures could lower the operational costs for businesses deploying AI solutions, though the API is also being enhanced to introduce features like epoch based checkpoint creation, a comparative playground for model evaluation and integration, with third party platforms like weights and biases for detailed fine tuning. Data management what does it mean? It means developers can now create applications that not only process text or voice, but also interpret and generate responses based on visual cues. Importantly, fine tune the responses for domain specific applications. If you want to know what type of cheese somebody's putting their camera, which was the example in their blog post, you can certainly do that now. It was swiss, by the way. [00:02:31] Speaker B: I was curious. It's kind of funny because I realized I'm still feeling like a new baby when it comes to using aih. The use case of needing this fine tuning API, like, I read this and it's cool, but it's sort of like I can't envision needing it yet but I think as my usage becomes more mature, I think this will be one of those things like, oh, cool, it's already solved. [00:02:57] Speaker C: Yeah, I mean, I feel like it's the same type of thing as us getting into the cloud too many years ago to state, which is like, oh, what are these things? And things. Over time you start to really learn and as you get more advanced, you can leverage these features that you don't even realize that you're going to want right now. Hopefully the world of AI continues moving faster than I'm capable of right now. [00:03:23] Speaker A: I think it'd be useful for things like quality assurance in manufacturing. For example, you could tune it on what your nuts and bolts are supposed to look like and what a good bolt looks like and what a bad bolt looks like coming out of the factory. You just stream the video directly to an AI like this and have it kick out all the bad ones. It's kind of neat. [00:03:44] Speaker C: I mean, I feel like they already had that because I remember seeing those like how it's made episodes where it's like it takes 10,000 pictures a second and compares and kicks out the things that are not already there. So. [00:03:57] Speaker A: Yeah, but that didn't use AI. Come on. [00:03:59] Speaker C: Yeah, it must, it must be better now. [00:04:03] Speaker A: Definitely. [00:04:04] Speaker C: All right. Introducing real time APIs OpenAI has launched real time APIs in public beta. Designed to enable developers to create applications with real time, low latency multimodal interactions. This API facilitates speech to speech conversations, making user interactions more natural and engaging. The real time APIs use websockets for maintaining a persistent connection, allowing for real time input and output of both text and audio. This includes functions calling capabilities, making it versatile for various applications. It leverages the new GPT 4.0 model, which supports multimodal inputs, text, audio, and now with vision capabilities and fine tuning, use cases such as interactive applications. Developers can build applications where they have to go back and forth with voice conversations, or even integrate with visual data for more comprehensive interaction, customer service cases or voice assistance. [00:05:11] Speaker B: This is what's going to actually replace me in all the meetings. This is fantastic. [00:05:16] Speaker A: Yeah. [00:05:16] Speaker C: Just think about how much time you'll have left in your life when you don't actually have to attend the meetings. You train a model. You fine tuned it based on Ryan's level of sassiness and how crappy he is that day, and you just put in the meeting so you can actually do work. [00:05:30] Speaker B: It's kind of a fantastic idea is to actually have a daily prompt like impersonate me, but on a scale of one to ten, your amount of care is at a zero. I like it. [00:05:44] Speaker A: It would be super interesting to record your meetings, will record your day and then train a model to react like you do. It's coming. [00:05:55] Speaker B: Oh, I fully implanted. [00:05:56] Speaker A: Keep going. [00:05:57] Speaker B: Yeah, I definitely am gathering transcriptions and recordings to see if I can do it because it would be hilarious. And I would stop going to meetings. I really would. I wouldn't even wait for it to be realistic. I wouldn't even try to get away with it at all. It could be like Max headroom style and I would still be like, yeah, no, I was totally there. And just lie. It'd be awesome. [00:06:23] Speaker A: You can't prove it wasn't there. [00:06:24] Speaker B: Yeah, exactly. [00:06:25] Speaker A: I saw something hilarious the other day, as an aside. It was, it was a latex finger that you could buy and stick over your, like a ring over one of your other fingers on your hand. And so when people take pictures of you, it looked like you had a third finger or 6th finger. And I thought that's just fantastic. That wasn't me, that's AI. Look, you see an extra finger. [00:06:45] Speaker B: Yeah, I thought that was the funny, I saw that too and I was like, that's genius. Yep. That's not admissible. Clearly I generated, that was great. [00:06:56] Speaker A: I did look at the pricing for this and I didn't, I guess, put it in, but it works out something like six cents a minute to have it listening if you're interacting. Something like six cents a minute for listening. And I think it was like $0.25 for, for responses. So it's pretty expensive to run in real time. If you wanted to have it running constantly. I'm sure the price will come down eventually. [00:07:16] Speaker C: Yeah, I feel like this is like the AWS model, which is like they don't really know exactly how to set it up yet how to run it, but enough that they can cover their costs and figure out in the future how to actually handle this in a better pricing. Also, once they get probably more capacity, because something like this, they need to have the availability at all times. So assuming they're running in Azure and scaling or whatnot, they need to play those lines of how much time available. [00:07:45] Speaker A: Yeah, but tell us how. [00:07:47] Speaker B: Well, websockets scale. [00:07:51] Speaker C: I plead on it. [00:07:52] Speaker A: Yeah, I mean, to put it in some context though, if it's around about thirty cents a minute for both know, the input tokens and the output tokens for something. If you think about a customer service use case, and this is actually answering a phone call and talking to you over the phone, that's $18 an hour. I'm pretty sure you wouldn't hire anyone in a call center with all the overheads for $18 an hour. So I'd be kind of concerned if I was working with one of those industries right now. [00:08:20] Speaker B: Yeah, it's priced competitively when you think about that. [00:08:23] Speaker C: I just am now curious more and more about the websockets. Like front door doesn't support websockets, yet the application gateways do. There is the entire, I don't remember what it's called, but Azure has a websocket software that like a platform, so there's options to do it. But I'd be curious to see how they're actually implementing this under hood because web sockets are a real pain. [00:08:50] Speaker B: They're hard, and there's no real great option for persistent streaming connection where you could do something like this. [00:08:57] Speaker C: Yeah. Is it Azure R? Is that what it's called? Trying to remember what the service was? [00:09:06] Speaker A: It's the piracy service. [00:09:08] Speaker C: Yeah, we're not in the azure section yet. We only talk about this. [00:09:11] Speaker B: Yeah. Yeah. So next up is introducing OpenAI canvas. Canvas is an innovative interface designed to enhance collaboration with chat GPT for writing and coding projects. So this moves beyond the traditional chat format and it offers a much more interactive and dynamic workspace and code development. So looking at the blog post and the pictures looks more like, you know, an interaction that you would have with pair programming or, you know, being able to reference sort of larger projects holistically in many locations. That said, the draft, the blog goes on to say that, you know, in canvas you can use it also for drafting emails, writing articles, and other sort of collaborative actions that you might do with others. But now you do with an AI because who needs friends? Canvas is great at creating content and it can adjust your tone, which I use it extensively for, and provides real time edits, which I found very fascinating because that's one of the things that is a little bit draggy with a chat responses. Right? Like I want something. That's like letting me know that do you want to sound more assertive? Hey, maybe you shouldn't be such a jerk. It's nicer to do that in real time than for me having to remember to prompt and do all things. [00:10:27] Speaker A: Yeah, it's cool. I got my pixel nine phone, which comes with Gemini Pro for a year, and I noticed a shift in the way AI is being integrated with things. It used to be, hey, do you want me to write this message for you? They've moved away from that now things, a little pushback against that. People want to feel like they're so authentic. So now instead, once you finish writing a message, it's like, would you like us to refine this for you? Like, yes, please. Make it sound more professional. Add something else. [00:10:58] Speaker B: Yeah. I mean, it's a trick, right? It's a balance between being Microsoft's clippy, right. Which no one liked because it was just, not only was it particularly not really helpful, but it was just kind of up in your business. Right. And so AI is sort of like it's more useful, but it still can be sort of distracting. Like I know having it in my id, like the integration, like I turned off a lot of the auto complete prompts and stuff because it was bothering me. Yeah. [00:11:24] Speaker A: Clippy is one of those things that stuck with me for years. It's probably like 35 years at this point and I still, on my Windows computer they have, I still have like a meme image of Clippy and it's like, do you want me to know you now later? Or will you least expect it? It hasn't gone away. Like all these tools fulfill the same purpose basically. [00:11:51] Speaker B: Yeah. I always worry about younger generations not understanding what I'm talking about, but I think Clippy is still pretty generally common knowledge. [00:11:59] Speaker C: 28 years ago, 1996. [00:12:03] Speaker A: Wow. During the war. I'm just joking. [00:12:06] Speaker B: Yes. Seriously. [00:12:07] Speaker A: All right, let's move on to AWS. AWS announces repost agent, a generative AI powered virtual assistant. I guess they're starting to leverage Genai to auto respond to posts on repost. Maybe because of the huge brain drain they've suffered from recently. They don't have any staff to respond to things in person. But I'm looking forward to the hallucinations as it posts. [00:12:29] Speaker B: I think it's really hilarious. Right? Like you, it's kind of like a forum post. It's kind of like a interaction with, with, with developers directly on these services. But I've posted many things that just didn't get any responses and so like I wonder if they're, you know, this is their argument for their, but for that, but it's just going to be auto generated stuff that I probably could have found in the documentation. [00:12:55] Speaker C: It would give that feedback hopefully to people if something happening. So like now you're going to not look for answers with one any above, like four, to know that there's a real person that responded back to them. Maintain access and consider alternatives. For monitron, another Amazon service that's shutting down. Customers are being told to consider alternatives. Specifically, they reference some of their partners, tactical edge industrial AI and factory AIH. If you are a current customer and you've run any sensors in the last 30 days or 30 days prior to October 31, so aka in the month of October, you will still be able to use it for a period of time. Additionally, if you are a business customer, they have already added you to an allow list, so you continue to leverage your devices. And if you're a retail customer, you can have some way that they will kind of handle the ordering on individual requests. So it's just interesting to slowly see how they're shutting down these lesser known services that aren't as widely used. [00:14:02] Speaker A: That's a weird one because I think they talked about this onstage at reinvent a few years ago. It was a whole big industrial Iot thing. We have these devices that might unique vibrations from each machine and we can tell weeks in advance if some parts going to fail or not. So it's kind of weird that they're killing it. I guess the functionality can be built with other primitives that they have and it doesn't need to be its own service. [00:14:27] Speaker B: Well, that's the interesting part to me is that it seems to me that the partners that they're referencing, leveraging as an alternative are all have AI in the name. Could they not have in this realm where you're trying to attach AI moniker to everything? What about the service itself and how it was designed prevented them from doing that? Or was it just one of those things where no one was, no one was using the service, so it's not really worth the investment. [00:14:56] Speaker A: Yeah, yeah. [00:14:57] Speaker C: I wonder if it was something they were just using internally, you know, for their, you know, pulleys or whatever they have, like in the warehouse in the Amazon warehouses. And they were like, let's just try to sell this. And then it just didn't work? [00:15:10] Speaker B: Yeah, maybe. I mean, it makes sense. They do have a lot of treadmills and robots and I. [00:15:16] Speaker A: It was reinvent. They talked about the coke company, what's it called? Agronomic services. Agronomic and energy services. So Amazon used themselves. There's coke, there's Fender, because they were talking about the important manufacturer of guitars and QA around there and they're picking the right types of wood and things. I remember them discussing that on stage. A reinvent as a guitar fan. That's kind of interesting. Yeah. How strange. Yeah. [00:15:47] Speaker B: Anyway, well, moving on. Amazon virtual private cloud now supports. Bring your own IP and bring your own asn in all AWs local zones. Yeah, no, I mean, this is great. Assuming that you actually have ipv sports for ipv four space to actually, you know, load into the cloud and you're in a local zone so you're already trying to extend your local footprint. So yay. [00:16:18] Speaker A: I guess wondering it's gonna be before we won't see these announcements anymore and we will have fully migrated off to IPV six and this just won't be. [00:16:24] Speaker B: An issue considering I've already been talking about IPV six and the migration to it for almost 20 years now. I think I'll be dead before we retire or. [00:16:34] Speaker A: No, I don't know. [00:16:36] Speaker C: I don't think anything in here specifically calls out ipv four, but kind of. I feel like this is targeting ipv four customers still. Yep, not available in China. [00:16:50] Speaker A: Did they already move to IPV six? Who knows? All right, Amazon EC two now supports optimized cpu's post instance launch. This allows customers to modify an instance's cpu options. After launch, you can modify the number of ECP use and or disable the use of hyper threading on a stopped instance to save on the cpu based licensing costs. I guess that's particularly pertains to database licensing like SQL server and Oracle. In addition, an instance of cpu options are maintained when changing its instance type. This is beneficial to customers who bring your own license for commercial database workloads like SQL Server. [00:17:28] Speaker B: Yeah, this is one of those things where it's a giant pain if you have to completely relaunch your instance or when you're trying to upscale your instance to a new instance type to get more memory or what have you, and having that completely reset. And so then not only are you trying to scale this, probably to avoid an outage now it's taking twice as long because you're going to do things. So this is one of those really beneficial features that no one will ever mention again because it's just going to be table stakes. [00:17:59] Speaker A: Yeah. So frustrating that you have to do it at the VM level, though. You'd think that if licensing was based on VCPus, you could use the software itself to restrict the number of VCPus that actually it will consume a runtime rather than making it apply to the whole machine. [00:18:15] Speaker C: Jonathan. But you're not thinking about it from the legal perspective of if you're using more than your license for, they can get that money, extra money out of you because you've launched too powerful a machine. [00:18:27] Speaker A: I get that. I'm just being practical. You know, I might want to run up two things on the machine, I might want to use those extra courses, something else, and only run logic. I only run my database on the, on the 16 cores I can afford million dollars a year for. [00:18:39] Speaker B: So yeah, it's, I still, I balk at any software that's licensed on a VCPU basis. Like, it just doesn't make sense in this day and age. Yeah, but it's still a reality. [00:18:49] Speaker A: Business outcome pricing. [00:18:52] Speaker C: This is like you said, one of those features that I'm never going to think about again. But the biggest feature on EC two that I remember that was always a pain in the neck was not being able to change the IM role associated with the EC two instance, for the longest time, if you ever wanted to change it, you had to take a snapshot of the server, take an ambient relaunch in like, that's one of those core features now that I just don't think about because I'm like, oh, we can just change the role. So I feel like this is going to fall in that category. [00:19:23] Speaker B: No, you're absolutely right. I had forgotten about that until you mentioned it used to be terrible. [00:19:28] Speaker C: Oh my God, I hated it. But that must be seven, eight years ago now that was released. [00:19:36] Speaker A: Yeah. [00:19:36] Speaker C: And then I just realized how old I am. [00:19:39] Speaker A: Don't do that. That's not healthy. [00:19:42] Speaker C: Don't come to that realization. Got it? [00:19:43] Speaker B: Yeah. [00:19:44] Speaker A: We're still what, like 24, 25? I think I got to about 28 when I stopped counting. [00:19:48] Speaker C: My grandmother was 29 till the day she died, so I get it. [00:19:53] Speaker A: Was that by chance when she had her first child? There are a lot of cloud cost management tools out there, but only Archero provides cloud commitment insurance. It sounds fancy, but it's really simple. Archero gives you the cost savings of a one or three year AWS savings plan with a commitment as short as 30 days. If you don't use all the cloud resources you've committed to, they will literally put the money back in your bank account to cover the difference. Other cost management tools may say they offer commitment insurance, but remember to ask, will you actually give me my money back? Achero will click the link in the show notes to check them out on the AWS marketplace. [00:20:40] Speaker C: So features that should have been there for a long time. Amazon workspaces now supports file transfers between workspace sessions and local devices. Amazon workspaces now supports file transfers between personal sessions and local computers. Administrators do have the ability to control, upload, and download permissions to try to safeguard data. And your infosec department is really going to love the new ways that they have to deal with data loss prevention. [00:21:08] Speaker A: So they re implement RDP, they take out the feature, then they add it again, and then they give you a switch, which everyone's gonna switch on to stop you from using it. That's fantastic. [00:21:17] Speaker B: Yep, that's exactly what's going on. [00:21:18] Speaker C: But they can check the box now saying it exists, which means they'll pass some RFP. So now they're more likely to be able to be considered. [00:21:28] Speaker B: True. [00:21:29] Speaker A: Yeah. The lengths I've gone to to get data off a machine. [00:21:33] Speaker B: Yeah. It's just unbelievable when you're in a secure environment that you're trying to go through either a jump host or one of these workspaces as your access, and then you need to ship logs to a third party vendor for support or. Oh my. Ridiculous. [00:21:52] Speaker A: I wrote Python code. Honey, did you get a bunch of logs off? I wrote Python code to take logs and embed the data into QR codes and it would flash a QR code on the screen. [00:22:01] Speaker B: Wow. [00:22:02] Speaker A: For half a second each. And I recorded it with my phone and then I wrote another app that read the video stream and turned the QR codes back into data. [00:22:12] Speaker B: Wow, that's impressive. [00:22:15] Speaker C: I have a lot more questions about that, sir. [00:22:18] Speaker B: Yeah. [00:22:20] Speaker C: What, that was your, was that your first thought process of how to pull the data off? [00:22:28] Speaker A: Yes, it was. [00:22:29] Speaker C: Okay. All right. You live in a different world than I live in. [00:22:32] Speaker B: Jonathan has always been way smarter to me and I've always known it because of little anecdotes like this. It's like, yeah, there's nowhere in my brain that would land on, oh, I'm gonna flash a QR code on the screen because that's the only access I have is visual. [00:22:45] Speaker A: Yeah. [00:22:46] Speaker C: I'm like, dump to s three. I don't know, make it, make it EFS volume. Launch some other service that you launch a server that's public facing. [00:22:58] Speaker B: Oh yeah. If you have admin access in the environment, sure. But I mean, this is always in those environments. Yeah. They've regret, they've removed all egress to the Internet, so you can't hit any APIs and you don't have any ability to deploy resources in that environment. So you can't build your own file system that you can then access somewhere else. It's whenever you're doing this, like, it's always one of those that goes through the exercise of like, how did it do this? [00:23:20] Speaker C: Yeah, how do you get it off? [00:23:21] Speaker B: The craziest thing I ever thought of was, you know, trying to do it, you know, like I, in one case, like I could, I had the ability to update DNS zones and I was thinking about it, they always say that it's a, you know, an attack vector for, for exfiltration. And I'm like, yeah. But I was like, oh, this is going to be, I'm not, not spending this much time. [00:23:43] Speaker A: That is cool. [00:23:44] Speaker C: Yeah, my lateness kicks in before then. [00:23:46] Speaker A: I think I just like finding ways to break things, finding ways around things, ways to break things. I mean, it serves two purposes. One, it got the data off the server. Two, it's a vulnerability that I'd love to see someone trying to fix. How are you going to stop me from running this code? I can literally write it in Powershell, I can write it in python, I could write it in bash, I suppose, if I had to. And ASCII art the QR code on the screen. You cannot stop me from doing this. If you let me have access to, to look at the screen, I can pull the data off to make it fuzzy. [00:24:17] Speaker B: Making it fuzzy. That's so everyone has to squint now. [00:24:24] Speaker A: Everybody. All right, I guess we'll move on to GCP. Introducing Valky eight on memory store. Unmatched performance and fully open source. I'm super excited about this, personally. I have a project coming up that was going to use Redis, and this is a cheaper, faster, better, all around awesome alternative. So Google Cloud has introduced Memorystore for Valky Eight, marking it as the first major cloud platform to offer Valkyrie as a fully managed service. The launch signifies Google Cloud's commitment to supporting open source technologies. Well, let's be honest, it's because Redis changed their licensing terms in providing a high performance in memory key value store alternative to redis. Compared with Redis, Valkyrie aims to maintain full compatibility while offering improvements in performance and community governance, but has changes and features like better data availability during failover events, support for vector search, which is going to be great for rag in AI machine learning applications, and improved concurrency allows for parallel processing of commands using bottlenecks. There's some great performance improvements. So Valky Eight on Memorystore offers up to twice the queries per second compared with Memorystore for Redis cluster, a microsecond latency, which is unbelievable. I'd love to know what they fix, what bug they fixed to do something like that, but yeah, that's great. [00:25:45] Speaker B: Yeah, no, it's when you see this type of change, especially right after a license kerfuffle that caused Valkyrie to come into existence, it's kind of like, wow, the power of open search is really there. And why wasn't this, you know, part of, you know, the redis thing? It's because people weren't going through it, you know, when it was that license. And so, like, kind of. It's kind of a good thing in a lot of sense. I also want to know how they're 8.0 already. Did they start at one? [00:26:16] Speaker A: I think it's. I think it's parallel with the reddest version. [00:26:19] Speaker C: Okay. [00:26:19] Speaker B: It makes more sense. [00:26:20] Speaker C: I just said they went the opposite way of Hashicorp. [00:26:25] Speaker A: 0.1.505. [00:26:30] Speaker C: No, I mean, like you guys have said, these feature performances. Sorry. These feature and the performance improvements are massive over redis. And, you know, I use Redis at my day job, so I would love for them to. Sure to get fully on board with Valkyrie, but they backed Redis right when they came out with the license change, so I doubt I'll be getting anytime soon. [00:26:54] Speaker A: Yeah, Microsoft don't tend to. I don't think they tend to fork things and get involved in building their own versions. They're much, much more partners with other companies. [00:27:04] Speaker C: They do a lot of partnerships. A lot of the solutions either are tier one partners. Their Nginx integrations are all built into the Azure console and complete, but they have a very large marketplace. They push you towards, I feel like, at times, and I'm just like, I don't want to use the market. I want to use a cloud native solution, not a third party that is making themselves be cloud native. [00:27:30] Speaker A: Yep. I understand, I guess because a lot of companies are also Microsoft customers, so they don't bite the hand that feeds them. [00:27:39] Speaker C: Then again, do I really want to say that Azure is a cloud native company, and all the Windows solutions I use are cloud native. It's another story for another day. [00:27:49] Speaker B: Yeah. We have to do, like, an after show after dark, something like that. [00:27:53] Speaker C: I think it has to be in person, not recorded, because there's no way we all don't get in trouble in person. [00:27:59] Speaker B: Yes, but recorded. Yeah. Come on. [00:28:02] Speaker A: We just, you know, we need a Halloween special. [00:28:05] Speaker B: Yeah. [00:28:09] Speaker A: Nightmares from the cloud crypt. [00:28:11] Speaker B: Yeah. [00:28:12] Speaker C: Justin keeps talking about how, you know, the Spotify plus the Apple model, you know, the paid version of the podcast. That's what it is. It's gonna be, you know, the after dark, you know, version of all of our. The stuff that we all deal with. [00:28:29] Speaker A: Every day, all the stuff that Elliot cuts out for us, this whole conversation. [00:28:35] Speaker B: Yeah, yeah, yeah. This won't make it. [00:28:38] Speaker C: God willing, it. [00:28:42] Speaker B: All right. I can take a hint. Next up is understand your cloud storage footprint with AI powered queries and insights managing millions and billions of objects across numerous projects, hundreds of cloud engineers accessing night and day. It's fun, right? [00:28:57] Speaker A: Like an ad for some kind of pharmaceutical. Do you suffer from managing multiple projects with hundreds of cloud engineers at fun at night? [00:29:03] Speaker C: Tell me more. [00:29:05] Speaker B: Yeah, I do suffer from that. Google is very proud to announce that they're the first hyperscale cloud provider to generate storage insights specific to an environment by querying object metadata by using the power of large language models. There it is. They didn't say AI, but they meant it. Amazon has had a similar feature for quite a bit, so I was a little confused by that. You've been able to get insights on s three objects for a bit, but using the Google Cloud resources after the initial setup, which is a little bit hinky, you'll be able to access the enhanced user experience, which includes a short summary of your dataset and pretty shiny graphs. And bonus, you get pre curated prompts with validated responses, whatever that means. And when selected these prompts, there's various ways where it combats hallucinations. Such as, like every response includes a SQL query, and in the curated prompts there's a high accuracy tag, I guess if it's highly accurate, I don't know. And there's helpful information that displays the data freshness about the data it's querying. [00:30:17] Speaker A: So is it telling us what we've got or is it telling us how much it costs for what we've got? Or is it, what's, who cares about this? Exactly. [00:30:24] Speaker B: Yeah, all the above. Right? So it's a insights in your, your storage data. So there's, there's performance tiers, you know, the ability to migrate it to lower performance tier for cost savings. There's insights on the access model and insecure sort of attack vectors that you could have. Like if it's a publicly exposed bucket and it has excessive permissions or it has sensitive content in it, it'll, it'll sort of provide that level of insight. And I think there's also sort of a performance base wherever might recommend moving certain objects to a higher performant tier of storage as well. [00:31:02] Speaker A: Is it expensive? [00:31:04] Speaker B: I'm sure it is. I did not go through the exercise of pricing it out, but any AI thing so far that we've been looking at is pretty expensive. I know in AWS the storage insights was very expensive. [00:31:16] Speaker A: Yeah. I mean, when the use case starts with when you've got millions and millions of objects, and it's plus AI. I'm like, yeah, yeah. All right, so Matt's going to take the next story while I move my little pony pictures out of my Google storage pocket. [00:31:31] Speaker C: So on to Azure. Announcing the general availability of the Azure cycle cloud workspace for slurm. So I wanted to start with deconstructing this title. [00:31:42] Speaker B: Oh, good, because I have no idea what it means. [00:31:45] Speaker C: So announcing Ryan means we are telling you something new. The general availability is unlike most things that Azure does, it's actually ready for use, and they supported by an SLA. Then we had to figure out what Azure cycle cloud was. Azure Cycle cloud is an enterprise friendly tool for organic and managing HPC high performance computing environments. On Azure, it's in high performance cluster workspace for slurred and slur merge scheduler. So this is what I've got, and I'm not even sure I'm still correct. You are now able to with support because general availability means it should be supported buy and launch from the marketplace, an orchestration and management tool for HPCs that leverages slurm to actually schedule stuff to run on said high performance cluster. [00:32:43] Speaker B: Cool. I thought this was going to be like competition for Soulcycle, so I was really excited about like, oh, sure, is getting into like cycling classes. But no, no, we have to put this on our list of like the worst, poorly named cloud services. [00:32:59] Speaker A: I mean, we just can't satisfy some people. Like it's, oh, we complain on one hand, oh, it's AI, AI, AI. And then this is absolutely focused at people running AI, scheduling AI workloads and inference workloads. They don't mention AI at all in the title. And like, I don't know, screw you guys. [00:33:19] Speaker B: Yeah, yeah. There is no winning. You should have known that. [00:33:23] Speaker C: I just didn't know what this was when I first saw it. And all I could think of was slurm ball. And then this is how we digressed in life for the rest of the pre show. [00:33:33] Speaker A: Was it the slurm that made you think of Futurama? That made you think of the bite my shiny metal PM's? [00:33:39] Speaker C: Okay, this is how our conversation digressed in the pre show. No, I mean, it's kind of nice. It's them integrating this into the marketplace, and it's probably just a bunch of people riding bicycles to power the Azure region near by you. So, you know, we hit Ryan's thing. We understand that. And we got Futurama, which is what they're watching on their soul cycle bikes. So they'll get bored. So we're good. [00:34:03] Speaker B: And speaking of shiny metal FX v two vms, Azure is announcing a public preview of them. They talk about like all instance type announcements, you know that this is a, it's best suited to provide a balanced solution for computed sensitive workloads such as database data analytics, EDA workloads and anything that requires large amounts of memory and high performance storage and I O and presumably just a teeny bit of CPU. Although they do tout that this has up to one and a half times improvement in cpu performance and two times the vcpus in these instance types with 96 vcpus available in the largest vm size. You know, then there's you know, two x to local storage and IOP's going all the way to, you know, 5280gb of local SSD capacity, 400,000 IOP's and eleven gigabits throughput and the premium V two ultra disk support network. And I guess you can go up to 1800gb of memory and largest instance types. I like that they brought back the turbo. There's the VMS feature in all core turbo frequency up to 4.0 GHz. [00:35:21] Speaker C: Push the turbo button. [00:35:22] Speaker B: Yeah. Just made me laugh. Haven't thought of that in a while. But, and you can deploy these where you, these vms where you get a 21 to one ratio of memory to VCPU. [00:35:33] Speaker A: Wow. [00:35:34] Speaker B: Yeah, it's cool. So while they're best suited for balanced and compute intensive workloads, but if you read further down the post, they get to the real answer, which is this is purpose built to address several requirements for Microsoft SQL Server, which totally makes sense. [00:35:53] Speaker A: I've decided after hearing you read that last thing, we need to do like an episode which is like entirely poetic or something, like poems or everything. And if you know, you know Kemlab, Kemlab, Kemlab industrial music, they do some really boot stomping heavy industrial music, but they also have some really kind of like poetic thoughtful songs to some just quiet music. I'm like listen, listen to you read that. All those tech specs out. And I think I'm like this should be a, this should be a chemlab song. Like. [00:36:26] Speaker B: I like it to figure out. Yeah. [00:36:29] Speaker A: Cloud pod the musical. [00:36:31] Speaker B: Yeah. [00:36:31] Speaker C: Oh God, please no, no. [00:36:35] Speaker B: Maybe that's how we get our pro subscription. Well they'll have to, they'll pay us to not publish. [00:36:43] Speaker C: Yeah, it just, this is how I'm thinking they're running their hyperscale clusters now because I think as I read this and you mentioned this, I'm like cool. I now know what they're running hyperscale on. It's the only way they could physically do it at this point. Now, generally available azure confidential vms with Nvidia H 100 tensor core gpu's. These are based on AMD EpyCs processors with obviously the NVA H 100. They are set up in the confidential manner, aka is all set up securely. These are designed for interfacing, fine tuning and training small to medium sized models such as whisper stable diffusion and other variants, and large language models. [00:37:33] Speaker A: How weird though like the point of confidential vm is that it has like one hole in that you put something in it, does some magic work on it and then spits an answer out. But you don't get to see the sausage being made inside. So the fact that it's, that they're like selling this for training or inference is really interesting. [00:37:54] Speaker B: Yeah. What specific data? [00:37:57] Speaker A: The model itself is source, I guess so. So what? Okay. [00:38:02] Speaker B: Like pulling the model out of memory or something like that. [00:38:04] Speaker A: Yeah. Which kinds of customers are likely to have such sensitive, secure models that require this level of security? [00:38:13] Speaker C: Well, according to this log post f five servicenow OpenAI Cyborg edgeless systems and opaque definitely are the key targets for this 100%. Don't know what else you're talking, what else you're talking about here, guys. I could see though if you're training your data on information that is confidential. So thinking also like if you were Visa or anything else, do you want to train your model on customer credit card history, purchases, things along those lines? That's the only other things I can think of, but it's, it's, that's all. [00:38:51] Speaker B: Like just secure in flight. Like so many easier ways to get that data. [00:38:55] Speaker A: Yeah, the company computing was for building things, things like hsms, where, you know, it was important to be able to verify that you could not extract data once it was running. So that's kind of. Yeah, I'm going to keep my eye on this. There's going to be some, some large three letter customers I believe, of something. [00:39:15] Speaker C: Like this you will never actually know. Let's be honest here, especially you with your accent there. [00:39:21] Speaker A: Speaking of my large report, I haven't seen that movie in quite a long time, but it's becoming ever more relevant. [00:39:26] Speaker B: Yeah, it is. [00:39:27] Speaker A: And we thought it was going to be people sitting in a swimming pool, but I think actually it's not. It's going to be servers sitting in a, what's it called? Bath of something. [00:39:35] Speaker B: And if you, if, if I go just by the number of intrusive thoughts that I have that I don't share. Like, I am going to show up like 100%. [00:39:46] Speaker C: We really go off the rails without Justin here. [00:39:49] Speaker B: Yeah, no, it's true. He keeps us in line. [00:39:52] Speaker C: Well, yeah, we should look at like, how well this episode does without Justin versus the ones with and see which way is better. [00:40:02] Speaker B: I don't need that hit. [00:40:04] Speaker A: No, no, no. If you fire him, then you'll write the show notes in perpetuity. [00:40:07] Speaker C: Oh yeah. [00:40:09] Speaker B: Down and out. [00:40:09] Speaker C: I'm out. Nope. [00:40:11] Speaker A: All right, this may be our last story of the day. What's new in phenops toolkit 0.5? Like, I wish they'd just give it a whole number. Come on. Like if you, if it's, wait, wait, wait. [00:40:22] Speaker C: We complained about Valkyrie that started. We complained about terraform. That's zero dot, zero dot, 65,535. But like 0.5. Now you want a whole number. Like, we can't really appease us today, guys. [00:40:37] Speaker B: No, we're completely unsatisfied. There's no help. [00:40:42] Speaker C: Sorry. You can actually now read second to last story, Jonathan. [00:40:46] Speaker A: Okay. Is it the second? Oh really? [00:40:48] Speaker B: Oh no, there's no way. Oh wait, no, it is the second to last. [00:40:52] Speaker A: I just have to totally slow down because we've got to pad this out to a whole hour. Sorry. The Philips Toolkit 0.5, released in August 2024, introduces several enhancements. I didn't know it exists before once. What was it like 0.1 before? .4 I don't know. Aimed at improving cloud financial manage financial management through Microsoft's Philips framework, this update focuses on simplifying the process of cost management and optimization for Azure users, with new features for reporting, data analysis and integration with power Bi for better financial analytics. Some key updates users can now connect power BI reports directly to raw cast data exports in storage without needing pinops. Hubs simplifying the setup for cost analysis, the toolkit now supports the Focus 1.0 schema for costs and usage data, which is awesome, which aims to standardize phenomenal data across platforms for easier analysis and comparison. We also have improvements in the Azure optimization engine for better customer recommendations. And there are new tools for updating and reporting, including a guide on how to compare focus data with actual or amortized cost data, aiding in more accurate financial reporting. I never want to touch that stuff. I pity anybody who has to do stuff like that. [00:42:05] Speaker B: What you've done in the past, I've seen you. [00:42:07] Speaker A: I know, and I pity myself. [00:42:11] Speaker C: I still do it during my day job this is weirdly, I find some of this stuff kind of fun, like diving in and, like, trying to, like, figure out all these details. But I view it more or less from the, like, finance aspect of, like, it's a challenge. Like, okay, I've saved x percent. How do I now save the next 5%? And, like, it's, I challenge myself with it. I'm a crazy person. That's also why I did the podcast with you guys. [00:42:38] Speaker B: Yeah, I mean, I agree. I use the data the same, same way, which is usually I'm getting grilled for, why does this cost so much money? And how dare you allow this? And also get out of the developer's way. You're slowing business down. So that's fun. But I at least am not on the sharp end of the stick. I am enabling, you know, our finops teams with the data themselves, and then they can go crunch it and turn it into pretty pictures for me, which is nice. But, and, you know, these, you know, focus has just done so much to unify the data across so that it's all the same across the different cloud providers. And so, you know, we still have a ton of room in our day job to make that information more usable and some stuff, but it's just, it was impossible before to draw, like, for, like, comparisons for different workloads, different clouds. [00:43:31] Speaker A: Yep. [00:43:33] Speaker C: Some of the updates in here are kind of find elastic pools, instances with no databases. Like, cool. That didn't need to be there. I feel like that would be like an old trusted advisor on AWS or just the AWS, I think it's called, like, azure optimizer. I think. I don't remember what the tool is called. Like those, some of these things are just built into or should already be built in. Some of them are kind of cool though, like identifying the reservation break even point. You know, how many times have you said, like, hey, I don't actually know how long we're gonna need this up for. And I'm like, I just use 75%. I'm like, if you're gonna keep this running for nine months, it's worth the one year reservation. So kind of having that more detailed link in there, I think it's actually. [00:44:17] Speaker B: Gonna be interesting just having it built in. Yeah, it's very handy because that's a constantly asked question. Right. And it always has to be answered by the people who are in no control of the workload. [00:44:29] Speaker C: Yeah. [00:44:31] Speaker B: I do like, but I mean, the elastic pools without databases and stuff like that really focuses on, like, the sort of unmentioned part of bitops, which is a lot of the philosophy, I guess, for lack of a better word, is, is, you know, there's the saving money in the finance aspects of it, but, you know, a lot of it is optimization based. And so just, just getting rid of waste and getting rid of complexity is actually very much part of the Finops charter. So, you know, little things like that, I think, speak towards that and I think that's cool. [00:45:00] Speaker A: Awesome stuff. Well, that was the last story. [00:45:03] Speaker B: Nope, second to last. There's one more. [00:45:05] Speaker C: He really doesn't want to talk about this last one. [00:45:07] Speaker B: He really doesn't. He's tried to kill it like three times. [00:45:09] Speaker C: I know. I'm sure. Talking before he does. GPT 40 real time preview with audio and speech capabilities. It's on Azure now, guys, too. [00:45:20] Speaker B: Yeah. [00:45:20] Speaker A: Stories. That sentence. [00:45:22] Speaker B: No, no, no. It was. It's the me too feature of the one above that was as part of our AI news, and we just did that to truly, and we realized that no one listens this far, especially not this one. [00:45:39] Speaker C: They're like justice. I hear we're out. [00:45:41] Speaker A: I guess we should release the confidential clapboard episode where people don't actually get to see any of how the sausages made. [00:45:51] Speaker C: That's the premium. It's the un edited, pre story, actual show and post. If people really want to see how this is done, it's bad news bears. [00:46:03] Speaker A: All right, guys, have a fantastic week. [00:46:05] Speaker C: Have a great week, guys. [00:46:06] Speaker B: Bye everybody. [00:46:10] Speaker A: And that's all for this week in cloud. We'd like to thank our sponsor, Archero. Be sure to click the link in our show notes to learn more about their services. While you're at it, head over to our [email protected], where you can subscribe to our newsletter, join our slack community, send us your feedback, and ask any questions you might have. Thanks for listening, and we'll catch you on the next episode.

Show Notes

Titles we almost went with this week:

A big thanks to this week’s sponsor: Archera

AI Is Going Great – Or How ML Makes All It’s Money

AWS

GCP

Azure

Closing

Episode Transcript

Other Episodes

Episode

Episode 25: Optimize your Journey with The Cloud Pod Center of Excellence

Episode

Episode 3 – 2018 Recap

Episode

Episode 12: Spotinst has yet to announce partnership with the cloud pod