236: We Now Measures the Largest Chips Used to Generate an LLM - or a 21st century #$%& Measuring Contest

Welcome to episode 236 of the Cloud Pod Podcast, where the forecast is always cloudy! Are you wandering around every day wondering just who has the biggest one? Chips, we mean. Of course. Get your mind out of the gutter. Did you know Azure was winning that battle for like 8 whole minutes? Join us for episode 236 where we talk about chip size, LLM’s, updates to Bedrock, and Toxicity Detection - something you will never find applied to the podcast. Not on purpose, anyway. Happy Thanksgiving!

Titles we almost went with this week:

You Can Solve All Your AI Problems by Paying the Cloud Pod 10 million Dollars.
Cloud Pods Interest in AI Like Enterprises is Also Shockingly Low
Llama Lambda Llama Llama Lambda Lambda… or How I Went Crazy
Comprehends Detects Toxicity with the Cloud Pod
You Didn’t Need Comprehend for Me to Tell You I’m Toxic
The Cloud is Toxic, Run!

A big thanks to this week’s sponsor:

Foghorn Consulting provides top-notch cloud and DevOps engineers to the world’s most innovative companies. Initiatives stalled because you have trouble hiring? Foghorn can be burning down your DevOps and Cloud backlogs as soon as next week.

AI is Going Great!

00:39 OpenAI’s New Weapon in Talent War With Google: $10 Million Pay Packages for Researchers (listeners note: paywall article)

The battle for AI talent is heating up between open AI and Google. With compensation packages but also promises of access to more hardware, better chips and more.
Open AI depends on Microsoft for its cloud resources, whereas Google owns its cloud and is manufacturing their own AI chips.
Salaries are crazy with stock compensation with Open AI saying their stock compensation could be worth as much as 5-10m.
Of course assuming that recruits start before the company goes public or gets completely acquired by MS.
So, bottom line? Money. Are you shocked? We’re shocked.

01:30 Jonathan - “I guess it's quite a concern actually that since Google bought DeepMind they have pretty much two-thirds of the entire global AI talent at their own disposal. So I guess this is a desperate needs, call for desperate measures kind of thing.” 01:49 Nvidia Unveils New AI Chip, Upping Ante with AMD (listeners note: paywall article)

Nvidia on Monday announced a new graphics processing unit, the H200, which next year could become the most advanced chip on the market for developing AI.
The chip's memory capacity has been significantly upgraded compared to the H100, which has been in high demand and boosting NVIDIA stock 240% since Jan 1. The increased memory allows LLM models powered by H200 chips to generate results nearly twice as fast as those running on H100s
Cloud companies should have the new chips available in 2nd quarter 2024 and will put these in tight competition with AMD’s MI300X gpu’s slated for release later this year.

02:29 Matthew - “ I feel like we're seeing the speed curve of processors and now we're just watching the same things that happened in the 90s and 2000s happen with GPUs. It's like, it will double every 18 months. That's fine. Or here sooner.” 04:51 Report: Enterprise investment in generative AI shockingly low, while traditional AI is thriving

Venture beat has a report on Gen AI in the enterprise mostly being hype.
Gen AI still accounts for less than 1% of enterprise cloud spend.
Traditional AI spend, on the other hand, comprises 18% of the $400 billion dollar cloud market
While growth is increasing in AI it's primarily been in traditional ML use cases. This is largely due to concerns around privacy, copyright, and limited packaged offerings of general AI technology for the enterprise.

05:36 Ryan - “I don't see any way where this is going to not be a huge contributor to cloud spend in coming years. I'm actually more surprised that it's the traditional AI and machine learning is only 18%. But then you have to realize that, you know, that's we're also an industry that's still largely doing rented compute. So it makes sense.”

AWS

06:32 AWS Audit Manager now supports first third-party GRC integration

We are officially in the build up to Re:Invent, so we’re going to start seeing some Main Stage hopes and dreams dashed by things being introduced early, including… AWS Audit Manager updates!
Auditing is a continuing and ongoing process, and every audit includes the collection of evidence.
The evidence gathered confirms the state of resources and is used to demonstrate that the customer's policies, procedures and activities are in place and that the control has been operational for a period of time.
AWS Audit already automates this evidence collection for AWS usage. However, large enterprise organizations who deploy workloads across a range of locations such as cloud, on-premise or a combination of both manage this evidence data using a combination of third-party or homegrown tools, spreadsheets and emails.
Now you can integrate AWS audit manager with third-party GRC provider MetricStream CyberGRC
You can learn all about Audit Manager pricing here. .

07:15 Justin - “Thank goodness, cause I'm, I was kind of thinking this was a walled garden that didn't make sense for a long time. So glad to see this one coming.” 07:42 Amazon Bedrock now provides access to Meta’s Llama 2 Chat 13B model

Meta’s Llama 2 Chat 13B LLM is now available to you on Bedrock.
Optimized for dialogue use cases

ADDITIONALLY - Amazon Bedrock now provides access to Cohere Command Light and Cohere Embed English and multilingual models

Cohere Command Light and Cohere Embed English and multilingual models are now available on Amazon Bedrock.
Command is Cohere’s flagship text generation model.
It is trained to follow user commands and to be useful in business applications.
Embed is a set of models trained to produce high-quality embeddings from text documents.
Great for Semantic Search, Text Classification and Retrieval Augmented Generation (RAG)

09:12 New for Amazon Comprehend – Toxicity Detection

Comprehend now detects toxic content, and hopefully they don’t use it on the cloud pod podcast…
The system will label it on 4 labels Profanity, Hate Speech, Insult and Graphic and a score

09:47 Ryan - “My very first thought when I read this is, you know, back in the day, I created a chatbot in IRC that would count swear words by user and you could run a command and it would just put that out. And so now I have an idea where plugging this into, you know, several team rooms in Slack or Teams and then giving a toxicity score would be pretty sweet. It would be pretty funny.” 10:28 Jonathan - “It's kind of interesting technology. I see use cases for it for sure for things like, you know, filtering reviews for online merchants, things that users post that end up on other people's websites. Makes a lot of sense. I guess I'm kind of concerned a little bit that this type of technology might end up in things like Teams or Zoom or any other kind of chat or Slack for that matter. And potentially like to report on a user's behavior or attitude or something else to kind of like their management in a way. Imagine that's quite a big brother-ish kind of technology, but I think the potential is there right now for this.” 12:59 Amazon Aurora MySQL zero-ETL integration with Amazon Redshift is now generally available

To help you avoid the job of preparing data for analysis, AWS is announcing the GA of Aurora MySQL zero-ETL integration with Amazon Redshift. With this fully managed solution you no longer need to build and maintain complex data pipelines in order to derive time sensitive insights from your transactional data to inform critical business decisions.
THis Zero-ETL capability unlocks opportunities for you to run near real-time analytics and ML on petabytes of transactional data in Amazon Redshift. As this data gets written into Aurora, it will be available in Amazon Redshift within seconds.
Support for Postgres is in the works
You get this capability for no additional cost, you only pay for the existing aurora and redshift resources used to create and process the change data created as part of the Zero ETL integration.

13:51 Justin - “The most interesting about this to me is this kind of breaks one of the main things about Amazon is that their services sort of all are independent of each other and they don't use the same storage subsystem. They don't do these things. And so now they've created a dependency where these things now have to work together. So that's kind of interesting, uh, paradigm shift. I love it. Uh, cause I hate running ETL jobs. Uh, and I can definitely see this being something I would use if I was on Aurora and I needed redshift. Um, so bravo, but also like, how does this work? I'm hoping maybe there's a re:Invent session that'll come up somewhere that details us a bit more. And I'll be keeping an eye out for that during re:Invent to see if I can learn more about how they're doing this magic in the backend.:

GCP

15:16 Introducing Cloud SQL in-place upgrade: move from Enterprise to Enterprise Plus with ease

Back in July google introduced cloud sql Enterprise with three major improvements in read/write performance, near zero downtime planned maintenance and 99.99% SLA and expanded data protection.
Now you can do a “seamless” in-place upgrade from Enterprise Edition to Enterprise Plus, which provides minimal disruption (<60 seconds).

16:32 Ryan - “I think what I liked most about this announcement is that they gave you a rollback procedure. You want to play out with the new enterprise and I've done that and then there's no way to turn it off. This is expensive and I don't want to pay for it and you have to kill the whole thing. So I like the fact that this can go both ways and you can see if you really need those advanced features or not.” 17:09 Google Cloud demonstrates the world’s largest distributed training job for large language models across 50000+ TPU v5e chips

Glad the clouds have found a new %$#@ measuring contest now that they’ve beaten how many characters of PI they can calculate.
Google Cloud is demonstrating the world's largest distributed training job for large language models across 50000+ TPU v5e chips.
With the boom in generative AI, the size of LLMs is growing exponentially, utilizing hundred of billions of parameters and trillions of training tokens
Training these models requires tens of Exa-FLOPS of AI supercomputing power, which is typically distributed across large clusters that contain tens of thousands of AI accelerator chips.
But utilizing large-scale clusters for distributed ML training presents many common problems
- Orchestration
- Compilation
- End-to-End optimization
Google has built TPU Multi-slide training from the ground up to address the challenges of distributed ML training in orchestration, compilation and end-to-end optimization.

Azure

18:39 Azure sets a scale record in large language model training

Azure is now also measuring, with their scale record in LLM training
GPT-3 LLM model and its 175 billion parameters was trained in four minutes on 1,344 ND H100 V5 virtual machines, which represented 10,752 Nvidia H100 Tensor Core GPUs
I mean it seems super fast… but 50000 TPU v5e chips is bigger than 10,752 Tensor Core H100 GPus

19:16 Justin - “So sorry Azure, no record for you today.” 20:02 Matthew - “So I figured oracles business model out. They're just a layer on top of all the other hyperscalers, which breaks everything. It'll be fine.” 20:13 Justin - “It's really just a tech company on top of a bunch of lawyers.”

Closing

And that is the week in the cloud! We would like to thank our sponsors Foghorn Consulting. Check out our website, the home of the Cloud Pod where you can join our newsletter, slack team, send feedback or ask questions at theCloud Pod.net or tweet at us with hashtag #theCloud Pod