Welcome to episode 260 of the Cloud Pod podcast – where the forecast is always cloudy! This week your hosts Justin, Matthew, and Jonathan and Ryan are talking about changes in leadership over at Amazon, GPT-4.o and its image generating capabilities, and the new voice of Skynet, Amazon Polly! It’s an action packed episode – and make sure to stay tuned for this week’s after show.
Titles we almost went with this week:
- Who eats pumpkin pie in May
- Bytes and Goodbyes: AWS CEO Logs Off
- AWS lets you know that you are burning money sooner than before
- High-Ho, High-Ho, It’s GPT-4-Ohhh
- The CloudPod pans for nuggets in the AI Gold rush
A big thanks to this week’s sponsor:
Big thanks to Sonrai Security for sponsoring today’s podcast! Check out Sonrai Securities’ new Cloud Permission Firewall. Just for our listeners, enjoy a 14 day trial at https://sonrai.co/cloudpod
General News
00:40 Terraform Enterprise adds Podman support and workflow enhancements
- The latest version of Terraform Enterprise now supports Podman with RHEL 8 and above.
- Originally, it only supported Docker Engine and Cloud Managed K8 services.
- With the upcoming EOL of RHEL 7 in June 2024, customers faced a lack of an end-to-end supported option for running a terraform enterprise on RHEL.
- Now, with support from Podman, this is rectified.
01:18 Ryan – “This is for the small amount of customers running the enterprise either on -prem or in their cloud environment. It’s a pretty good option. Makes sense.”
01:42 Justin – “You know, the thing I was most interested in at this actually is that Red Hat Linux 7 is now end of life, which this is my first time in my entire 20 some odd career that I’ve never had to support Red Hat Linux in production because we use Ubuntu for some weird reason, which I actually appreciate because I always like Ubuntu best for my home projects, but I didn’t actually know Red Hat 7 was going away.”
AI Is Going Great (Or, How ML Makes All It’s Money)
03:58 Hello GPT-4o
- Open AI has launched their GPT-4o (o for Omni) model which can reason across audio, vision and text in real time.
- The new model can accept input combinations of text, audio and image and generates any combination as output. It can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, similar to human response time in conversation.
- It matches GPT-4 Turbo performance on text in English and OCDE, with significant improvements on text in non-english languages, while also being much faster and 50% cheaper in the API.
- GPT-4o is especially better at vision and audio.
- Previously you could interact with ChatGPT using voice mode, but the latency was 2.8 seconds for GPT-3.5 and 5.4 seconds in GPT-4 on average.
- This was because the old model was actually three separate models. Audio to Text, the text GPT processing and then text back to audio.
- This was not great for GPT-4, as it lost information like tone, multiple speaker identification or background noises, and it can’t output laughter, singing or express emotion.
- Chat GPT wants to point out that they have also assessed this model with their Preparedness framework and in line with voluntary commitments, as well as extensive external red teaming with 70+ external experts in domains such as Social Psychology, bias and fairness and misinformation to identify risks that are introduced or amplified by the newly added modalities.
- They recognize that audio modalities present a variety of novel risks, and they are publicly releasing text and image inputs and text outputs.
- Over the weeks and months, they will be working on the technical infrastructure, usability via post-training, and safety necessary to release the other modalities.
- For example, at launch, audio output will be limited to a selection of preset voices and will abide by existing safety policies.
- This new model is their latest step in pushing the boundaries of deep learning, this time in the direction of practical usability. They spent a lot of effort over the last two years working on efficiency improvements at every layer of the stack. GPT-4o’s capabilities will be rolled out iteratively (with extended red team access.)
- GPT-4o’s text and image capabilities will be rolling out in ChatGPT now, and they will make GPT-4o available in the free tier, and to plus users with 5x higher message limits.
- API access is available for both the text and vision model.
- GPT-4o is 2x faster, half the price, and has 5x higher rate limits compared to GPT-4 Turbo.
08:20 Justin – “there’s so many opportunities for it to be abused. And then also with voicing, if you can start sending the voice to other people, then next thing you know, your mother is calling you desperately in need of money and asking you to wire her money because she ended up in Tahiti. And you’re like, how did you end up there? Don’t answer. I came here with your father and I need money. Weird stories are going to come out of this technology for sure.”
AWS – AWS CEO signs Off
09:30 AWS CEO logs off after three years at the helm
- Adam Selipsky is stepping down from his CEO role of AWS web services. He is being replaced by Matt Garman
- Adam was tapped 3 years ago by Andy Jassey as Andy transitioned to replace Jeff Bezos.
- Prior to being the CEO, Selipsky spent a decade running AWS sales, marketing and support before leaving to run Tableau through their acquisition by Salesforce.
- “I’d like to thank Adam for everything he’s done to lead AWS over the past three years,” Jassy wrote in a letter. “He took over during the pandemic, which presented various leadership and business challenges. Under his direction, the team made the right long-term decision to help customers become more efficient in spending, even if it meant less short-term revenue for AWS.”
- Matt Garman has been with AWS for 18 years, starting as an Intern in 2005 before leading several key product divisions including Amazon EC2 and AWS Compute Services. He has recently served as SVP of AWS Sales, Market and Global Services.
- “Matt has an unusually strong set of skills and experiences for his new role,” Jassy said of AWS’s latest chief executive. “He’s very customer focused, a terrific product leader, inventive, a clever problem-solver, right a lot, has high standards and meaningful bias for action, and in the 18 years he’s been in AWS, he’s been one of the better learners I’ve encountered.”
10:44 Ryan – “I wish executive letters were truthful and honest in some way. Like, on one hand I hope it is just a matter of, of Adam saying, you know, this job’s hard and I only want to do it for a few years and move on. But I doubt it, just because it seems that it’s all performance based and, you know, probably not getting along or not making enough meaningful movements in the right areas.”
16:28 A new generative engine and three voices are now generally available on Amazon Polly
- AWS is announcing three new generative voices for Polly, with Ruth and Matthew in American English. And Amy in British English. The new generative Engine was trained with publicly available and proprietary data and in a variety of voices, languages and styles.
- Usually, I wouldn’t talk about this story… but these things are incredible.
- To show the different AWS-provided sample prompts, the first voice is the 2019 Neural TTS voice.
- This voice uses a sequence-to-sequence neural network that converts a sequence of phonemes into spectrograms and a neural vocoder that converts the spectrograms into a continuous audio signal.
- This provides a higher quality of humanlike voices than the prior 2016 version.
- This new model, Big Adaptive Streamable TTS with Emergent Abilities, creates a humanlike, authentically generated voice. You can use the voice as a knowledgeable customer assistant, virtual trainer or experienced marketer.
- When we can train a custom model of our voices, and AI can (do a subpar job of) writing our show notes, we can all retire and profit.
19:46 Jonathan – “Yeah… all I’m thinking is if Kindle doesn’t have this as an option to read any book as an audiobook within six months, I will eat my hat.”
20:55 Build RAG and agent-based generative AI applications with new Amazon Titan Text Premier model, available in Amazon Bedrock
- Amazon is welcoming the newest member of the Amazon Titan family of models: Amazon Titan Text Premier, which is now available in Bedrock.
- Following the previous announcements of Titan Text Lite and Express, the premier is the latest significant language.
- Titan Text Premier has a maximum context length of 32k tokens, it has been specifically optimized for enterprise use cases, such as building RAG and agent-based applications with Knowledge Base and Agents for Amazon Bedrock. Titan was pre-trained on multilingual text data but is best suited for English Language tasks.
- You can further fine-tune with your own data in Bedrock to build applications that are specific to your domain, organization, brand style, and use cases.
- You can leverage Titan Text Premier in RAG use cases through KB for Amazon Bedrock
- Automating tasks through integration with Agents for Amazon Bedrock.
- Want to see a demonstration? You can find a video here.
21:44 Jonathan – “I mean, thanks Amazon for playing. I guess I see why Adams is no longer there, because the other guys just kick the crap out of you. that poly thing was pretty good, but everything else, small text LLMs are not pretty exciting right now.”
23:36 Build generative AI applications with Amazon Bedrock Studio (preview)
- Amazon is previewing Amazon Bedrock Studio, a new web-based generative AI development experience.
- Bedrock Studio accelerates the development of generative AI applications by providing a rapid prototyping environment with key Amazon Bedrock features, including KB, Agents and Guardrails.
- As a developer, you can build applications using a wide array of top-performing models and evaluate and share your generative AI apps with Bedrock Studio. The user interface guides you through various steps to help improve a model’s responses.
- You can quickly experiment with model settings, and securely integrate your companies data sources, tools and APIs and set guardrails. You can collaborate with team members to ideate, experiment and refine your Gen AI applications, all without advanced ML expertise or AWS Management Console access.
24:24 Matthew – “They’ve always tried to build these like web console studios for stuff. I’ve never seen them fully take off like cloud nine and let they release another one later on like the studio, these web portals that kind of act as your editors of sorts. And they all seem good, but I feel like most people I know never actually fully get, get all the way into them.”
28:20 AWS Cost Anomaly Detection reduces anomaly detection latency by up to 30%
- I can now find out Jonathan left that ML workload running in the TCP account 30% faster.
- AWS Cost Anomaly Detection will now detect cost anomalies up to 30% faster.
- Customers can identify and respond to spending charges more quickly.
- AWS Cost Anomaly Detection analyzes cost and usage data up to 3 times a day, instead of daily, to detect anomalies.
29:32 AWS CISO tells The Reg: In the AI gold rush, folks are forgetting application security
- At RSA last week, AWS Chief Information Security Office Chris Betz shared some thoughts on AI.
- Companies forget about the security of the application in their rush to use generative AI. Shocking, no?
- There needs to be safeguards and other protections around advanced neural networks – from training to inference, to avoid them being exploited or used in unexpected and unwanted ways.
- Betz described securing the AI stack as a cake with three layers: the bottom layer is the training environment, where the LLM is built.
- How do you make sure you are getting the right data, that the data is protected, that you’re training the model correctly and that you have the model working the way that you want?
- The middle layer provides access to the tools needed to run and scale generative AI apps.
- This ensures that you run the model in a protected way, especially as these models get handed increasingly sensitive data.
- Finally, the top layer is the applications that use the LLM.
- Betz points out the first few layers are new and novel for customers, but the the third is susceptible to standard security attacks
31:06 Ryan – “Yeah, it’s a whole new world out there. And companies are racing for adoption, right? Because if you’re behind, it feels really behind because of how fast everything’s moving. And so it’s a tricky thing because it’s short -cutting security about it. But there’s also sort of like, we’re having to figure out what the right way to secure development processes for AI are, right? Like, how do you train against, you know, exploiting at the prompt level. How do you segregate data access that’s within the model training? And these are all new questions that we’re sort of in the early days of figuring out. And we’re having to do that while trying to reach into the market at record pace. And something bad is going to happen and reset this.”
33:53 Amazon S3 will no longer charge for several HTTP error codes
35:21 New compute-optimized (C7i-flex) Amazon EC2 Flex instances
- Amazon is releasing the new c7i-flex instances, which are great for workloads that don’t require full computer power 100 percent of the time.
- The flexibility resonated with customers with the M7i-flex, so expansion of the c7i makes sense.
- The C7I is available in 2×4 up to 32×64 configurations.
- For sporadic workload computing needs the C7i-flex is a great choice, but it is not good for HPC, multiplayer gaming or video encoding. The C7I also comes in more shapes.
- The C7I also comes with higher bandwidth and storage IO capabilities.
36:47 Justin – “Yeah, so that’s the T, they don’t have it on the C7s, but they do on the Ts still. But you know, on the T side, even they got rid of the guaranteed CPU time, but you could basically click a box and you wouldn’t get burst credits anymore. Yeah, on the T3s. So this is kind of like a next evolution of it where, you know, it’s a workload that is more sporadic and you don’t need guaranteed throughput or capacity, but you do want it when you need it for short bursts.”
GCP
39:31 What’s new with Active Assist: New Hub UI and four new recommendations
- The Active Assist portfolio of intelligent tools can help you reduce costs, increase performance, improve security, and even help you make more sustainable decisions. Google is excited to announce new active assist features that will help address their customer’s largest concerns.
- 1 Revamped recommendation hub, with a new organization view of all your projects’ recommendations in one UI.
- Pre-filtered recommendations by value category – You can now view all of your recommendations under one category in a simple table view, so you can prioritize and focus on the recommendations that are the most relevant and important to you.
- Custom sorting and filtering with their new table views, you can sort and filter by different fields, such as product category, recommendation, cost savings, priority, etc.
- They now have new recommendations as well:
- Cloud Deprecation and breaking change recommendations
- IAM for BQ recommendations
- Advisory notifications recommendations
- Recent change recommendations
41:06 Ryan – “I like that one because I like this optimization overall. I laugh at anything that’s optimization because it’s a great way to build a report that no one’s going to read or act on. But you still have to do it because otherwise you’ll get no traction. But some of the other optimizations and insights that they’re building into this experience, I think, will be really powerful for cloud providers and practitioners.”
42:36 Kubernetes 1.30 is now available in GKE in record time
- Kubernetes 1.30 is now available in GKE Rapid Release less than 20 days after the OSS release. (AWS Cries in the corner)
- 1.30 has several enhancements, including Validating Admission policy, which allows many admission webhooks to be replaced with policies defined using the Common expression language and evaluated directly in the kube-apiserver. This feature benefits extension authors and cluster administrators by dramatically simplifying the development and operation of admission extensions.
- Validation Ratcheting, which makes custom resource definitions even safer and more accessible to manage.
- Aggregated discovery graduates to GA, improving clients’ performance particularly kubectl, when fetching the API information needed for many common operations.
44:29 Announcing Trillium, the sixth generation of Google Cloud TPU
- Google I/O started today, so now I have to tell you about… you guessed it … AI stuff from Google.
- First up, Google is announcing the Trilium TPU, which has achieved a 4.7x increase in peak compute performance per chip compared to the TPU v5e.
- Is it just us, or did they JUST announce this?
- They doubled the high bandwidth memory capacity and bandwidth, and also doubled the interchip interconnect bandwidth over the TPU v5e.
- Trillium is equipped with third generation SparseCore, a specialized accelerator for processing ultra-large embeddings common in advanced ranking and recommendation workloads.
- Trillium can scale up to 256 TPUs in a single high-bandwidth, low-latency pod.
45:17 Justin – “I mean, I’m sure someone cares who’s building really large models like Coheer or OpenAI or, you know, people who build models probably care about these because like every second, you know, you save, the scale they’re trying to build these models in, you know, can result in days of savings and building a new model.”
47:01 Vertex AI at I/O: Bringing new Gemini and Gemma models to Google Cloud customers
- Vertex AI has some updates today, including new models from Google DeepMind.
- Available today:
- Gemini 1.5 Flash, in public preview, offers their groundbreaking context window of 1 million tokens, but is lighter-weight than 1.5 Pro and designed to efficiently serve with speed and scale for tasks like chat apps
- PaliGemma, available in Vertex AI Model Garden, is the first vision first language model in the Gemma family of open models, and is well suited for tasks like image captioning and visual question answering.
- Coming Soon:
- Imagen 3 is their high-quality text to image generation model
- Gemma 2 is the next gen of open models built for a broad range of AI use cases
- Gemini 1.5 pro with an expanded 2 million context window.
- Vertex AI gets three new capabilities:
- Context Caching, lets customers actively manage and reuse cached context data.
- Controlled generation, lets customers define Gemini model outputs according to specific formats or schemas.
- Finally, Batch API, available in preview, is a super efficient way to send large numbers of non-latency sensitive text prompt requests, supporting use cases such as classification and sentiment analysis, data extraction, and description generation.
- Agent Builder was announced at Next ‘24 has some new enhancements. Like support for Firebase Genkit and LlamaIndex on Vertex Ai.
- Genkit, announced by firebase today, is an open source typescript/javascript framework designed to simplify the development, deployment and monitoring of production-ready AI agents.
50:01 Justin – “Imagen 3 reminded me of something about ChatGPT 4 .0. So one of the things that annoys me the most about taking text and generating images, is that if you give it text, it never produces the text correctly. So you’d be like, yeah, like it’ll, you’re like, Hey, I want you to draw me a Sherpa climbing Mount Everest and below it, I want you to write the word cloud sherpa, right? Cause you’re trying to make a sticker or something. And it will be like, ‘clud serpa’ Even though you spelled it exactly right, what you wanted, it is never correct. So in the new chat tpt 4 .0, one of the demos they showed you is actually where you gave it text input to have it generate into the image and it actually uses the proper text. It doesn’t modify it or change anything.”
Azure
52:23 Bringing generative AI to Azure network security with new Microsoft Copilot integrations
- Azure is announcing Azure Web App Firewall and Firewall Integrations in Microsoft Copilot for a Security standalone experience. This is the first step toward bringing interactive, generative AI-powered capabilities to Azure network security.
- Organizations can empower their analysts to triage and investigate hyperscale data sets seamlessly to find detailed, actionable insights and solutions at machine speeds using a natural language interface with no additional training.
- Copilot automates manual tasks and helps upskill Tier 1 and Tier 2 analysts to perform tasks that would otherwise be reserved for more experienced tier 3 and 4 professionals.
- TODAY detects a variety of web application and API security attacks generating terabytes of logs that are ingested into log analytics.
- While the logs give insights into the WAF, its non-trivial and time-consuming activity for analysts to understand the logs and gain actionable insights.
- The Copilot helps analysts perform analysis of data in minutes. Specifically, it synthesizes data from Azure Diagnostics logs to generate summarization of Azure WAF rules triggered, investigation of security threats including WAF rules triggered, malicious IP addresses, and analyzing SQL Injection and Cross-Site Scripting attacks blocked by the WAF.
- Azure firewall has similar use cases with analysts needed to look through large amounts of allow and deny ogs. Add metadata about the IP including IPs, sources, destinations and vulnerabilities and CVE’s associated with signatures.
55:38 Matthew – “That’s one of the big things they’re pushing is copilot helps all your tier one and tier two people really get, you know, solve all the problems that your tier three and four, you know, and that’s one of the things that they’re pushing is, Hey, this is something that will actually like help your tier ones get done more of the work. And that’s where they’re stating that copilot will actually help everything. Also, have you ever met a junior developer or really a senior developer that actually looks at logs? So like, yeah, sometimes you just have to tell people to look at logs.”
58:00 Announcing the General Availability of GPT-4 Turbo with Vision on Azure OpenAI Service
- GPT-4 Turbo with Vision is now available on the Azure OpenAI service, which processes both text and image inputs and replaces several preview models. Customers in various industries have already used this multi-modal model to enhance efficiency and innovate, with case studies to be featured at the upcoming build conference.
58:44 Introducing GPT-4o: OpenAI’s new flagship multimodal model now in preview on Azure
- Microsoft is thrilled to announce the launch of GPT-4o, OpenAI new flagship mode, is available with Azure AI in Preview.
58:59 Matthew – “Well, it was in preview for a while. And then if you really look at this, turbo is really available, I think in like Sweden and East two and like maybe one other region with very limited quantities.”
59:43 Microsoft and LinkedIn release the 2024 Work Trend Index on the state of AI at work
- Microsoft and LinkedIn are releasing their 2024 Work Trend Index focusing on AI at work.
- This is the fourth annual work trend index and the first time they’ve partnered with Linked In on the joint report. They surveyed 31,000 people across 31 countries, identified labor and hiring trends from LinkedIn, analyzed trillions of MS365 productivity signals and conducted research with Fortune 500 customers.
- Findings
- Employees want AI at work — and won’t wait for companies to catch up (oh yeah, a company that makes an AI assistant for employees) Per the company 3 out of 4 knowledge workers now use AI at work. Employees, overwhelmed and under duress, say AI saves time, boosts creativity and allows them to focus on important work.
- For employees, AI raises the bar and breaks the career ceiling
- The rise of AI Power users and what they reveal about the future.
- The prompt box is the new blank page.
1:01:11 Ryan- “No, I mean, there’s no doubt that it’s useful and there’s a whole bunch of really mundane things that this is going to get rid of. And I’m pretty stoked about that, right? Like I’m not too jazzed about, you know, everyone who’s an entry level position being sort of wiped out and, you know, they still don’t know what to do about like, you know, if you don’t have any entry level positions, how do you get to the next level? But, you know, at a certain point, like these things are, you know, like every automation enhancement.they replace a very fundamental part that’s easy to recreate and saves a lot of time. And I think it’s great. But yeah, shadow AI is going to be a big problem for a long time to come just because there’s a whole lot of legal ground that has to be developed to even know how to handle this.”
1:02:06 Public preview: Azure Application Gateway v2 Basic SKU
- Microsoft Azure introduces public preview of Azure Application Gateway v2 Basic SKU.
- Yes. This is just a cheap load balancer.
- Enhanced features promise improved performance, scalability, and cost-effectiveness.
- Features that are missing
- URL rewrite
- mTLS
- Private Link
- Private-only
- TCP/TLS Proxy
Oracle
1:06:50 It’s here—Red Hat OpenShift on OCI!
- Apparently customers are excited about the pending announcement from Cloudworld 2023 that Redhat Open Shift was coming to OCI. And now it’s officially here. Woohoo.
- Redhat Openshift versions 4.14 and 4.15 are validated on OCI compute vm’s.
- Like RHEL, they are working on validating bare metal shapes as a fast follow.
- The following editions are available, including Platform Plus, Container Platform, and Openshift Kubernetes Engine.
1:07:31 Ryan – “What a strange announcement. Because it’s not quite a managed OpenShift service. It’s just we’ve proven we can run it on there. So it’s like, cool.”
1:08:40 Elon Musk’s xAI nears $10 bln deal to rent Oracle’s AI servers, The Information reports
- Elon and Oracle seem like a match made in evil genius heaven.
- xAI is in negotiations to spend $10bn to rent cloud servers from the company over a period of years, the Information reported.
- The deal would make xAI one of Oracle’s largest customers.
- Negotiations are ongoing and not concluded yet.
1:09:17 Matthew – “Wasn’t there a news article where Elon like picked up a bunch of servers from AWS and like, just like V to C P to V them all back to on -prem and then like drove them from one data center to the other or something like that. Yeah. So it feels like why would he want to run it on the cloud if he thinks data centers make sense for everything.”
Aftershow
1:11:07 Bringing Project Starline out of the lab
- In 2021, Google shared their vision for Project Starline, a breakthrough technology project that enables friends, families and co-workers to feel like they’re together from any distance.
- Using advancements in AI, 3D Imaging and other technologies, Starline works like a magic window. You can talk, gesture and make eye contact with another person, just as if you were in the same room.
- Google is finally bringing this technology out of the lab with a focus on connecting distributed teams and individuals in the workplace.
- They are partnering with HP to start commercialization of this unique experience in 2025 and are working to enable it directly from the video conferencing services you use today such as Google Meet and Zoom.
Closing
And that is the week in the cloud! Go check out our sponsor, Sonrai and get your 14 day free trial. Also visit our website, the home of the Cloud Pod where you can join our newsletter, slack team, send feedback or ask questions at theCloud Pod.net or tweet at us with hashtag #theCloud Pod