Amazon is racing to catch up with next-generation AI with custom AWS chips
In an unmarked office building in Austin, Texas, there are a handful of two small rooms Amazon staff designing two types of microchips for training and accelerating next-generation AI. These custom chips, Inferentia and Trainium, offer AWS customers another option to train their large language models Nvidia GPUs, which have become difficult and expensive to buy.
“The whole world would like to make more chips for next-generation AI, whether it’s GPUs or whether it’s Amazon’s own chips that we’re designing,” Amazon Web Services CEO Adam Selipsky told CNBC in an interview in June. “I think we’re better positioned than anyone else on Earth to provide the capability that our collective customers are going to want.”
But others have worked faster, and invested more, to catch business from the AI boom. When OpenAI launched ChatGPT in November, Microsoft gained widespread attention for hosting the viral chatbot, and investing a reported $13 billion in OpenAI. It was quick to add the AI generation models to its own products, introducing them to Bing in February.
That same month, Google launched its own large-scale language model, Bard, followed by a $300 million investment in Anthropic competitor OpenAI.
It wasn’t until April that Amazon announced its own family of large language modules, called Titan, along with a service called Bedrock to help developers improve software using Generational AI.
“Amazon is not used to running markets. Amazon is used to creating markets. And I think for the first time in a long time, they’re finding themselves on the back foot and they’re working to catch up,” said Chirag Dekate, VP Analyst at Gartner.
Meta also recently released their own LLM, Llama 2. The open source competitor ChatGPT is now available for people to test on Microsoft’s Azure public cloud.
Chips as a ‘real difference’
In the long term, Dekate said, Amazon’s custom silicon could give it a chance in next-generation AI.
“I think the real difference is the technical capabilities that they bring,” he said. “Because you think what? Microsoft doesn’t have Trainium or Inferentia,” he said.
AWS quietly started custom silicon production back in 2013 with a specialized piece of hardware called Nitro. It is now the highest AWS chip. Amazon told CNBC that every AWS server has at least one, with a total of more than 20 million in use.
AWS started custom silicon production back in 2013 with this unique piece of hardware called Nitro. Amazon told CNBC in August that Nitro is now the top AWS chip, with at least one in every AWS server and a total of more than 20 million in use.
Courtesy of Amazon
In 2015, Amazon bought Israeli chip startup Annapurna Labs. Then in 2018, Amazon launched its Arm-based server chip, Graviton, a competitor to x86 CPUs from giants like AMD and Intel.
“Army is probably in the high single digits to maybe 10% of server sales, and a good portion of those are going to be Amazon. So on the CPU side, they’ve done very good,” said Stacy Rasgon, senior analyst at Bernstein Research.
Also in 2018, Amazon launched its AI-focused chips. That came two years after Google announced the first Tensor Processing Unit, or TPU. Microsoft has yet to announce the Athena AI chip it is working on, reportedly in partnership with AMD.
CNBC got a behind-the-scenes tour of Amazon’s chip lab in Austin, Texas, where Trainium and Inferentia are developed and tested. Product VP Matt Wood explained why the two chips.
“Machine learning breaks down into these two different levels. So you train the machine learning models and then you run inference against those trained models,” Wood said. “Trainium provides about a 50% improvement in price performance compared to than any other way to train machine learning models on AWS.”
Trainium first came on the market in 2021, after releasing Inferentia in 2019, which is now in its second generation.
Inferentia allows customers to “deliver real-time, real-time, high-cost, low-latency machine learning decision making, which is all predictive when you plug into your generative AI model, that where everything is processed. give you the answer,” Wood said.
For now, however, Nvidia GPUs are still king when it comes to training models. In July, AWS launched new AI acceleration hardware powered by Nvidia H100s.
“Nvidia chips have a huge software ecosystem built around them over the last 15 years that nobody else has,” Rasgon said. “Nvidia is the big winner from AI right now. “
Amazon’s custom chips, from left to right, Inferentia, Trainium and Graviton are seen at Amazon’s headquarters in Seattle on July 13, 2023.
Benefiting from cloud leadership
AWS’s cloud dominance, however, is a big differentiator for Amazon.
“Amazon doesn’t need to win headlines. Amazon already has a strong cloud install base. They just need to figure out how to enable their customers to expand into value-creating trends using generational AI,” Dekate said.
When choosing between Amazon, Google, and Microsoft for next-generation AI, there are millions of AWS customers who may be drawn to Amazon because they already know it, run other applications and ‘ store their data there.
“It’s a question of speed. How quickly these companies can move to develop these next-generation AI applications is driven by first starting with the data they have in AWS and using computational tools and machine learning. that we provide,” explained Mai-Lan Tomsen Bukovec, VP of technology at AWS.
AWS is the world’s largest cloud computing provider, with 40% of the market share in 2022, according to technology industry researcher Gartner. Although operating income has been down year over year for three quarters in a row, AWS still accounted for 70% of Amazon’s $7.7 billion total operating profit in the second quarter. AWS operating margins have historically been much wider than Google Cloud’s.
AWS also has a suite of developer tools focused on generative AI.
“We will reset the clock even before ChatGPT. It’s not like after that, we suddenly rushed and came up with a plan because you can’t engineer a chip in that quick of a time, let alone build in a Bedrock service. a matter of 2 to 3 months,” said Swami Sivasubramanian, VP of AWS database, analytics and machine learning.
Bedrock gives AWS customers access to large language models produced by Anthropic, Stability AI, AI21 Labs and Amazon’s own Titan.
“We don’t believe that one model is going to rule the world, and we want our customers to have the latest models from multiple suppliers because they are going to choose the right device for the right job,” Sivasubramanian said.
An Amazon employee works on custom AI chips, in a jacket with the AWS’ chip Inferentia logo, at the AWS chip lab in Austin, Texas, on July 25, 2023.
One of Amazon’s latest AI offerings is AWS HealthScribe, a service released in July to help doctors summarize patient visits using generative AI. Amazon also has SageMaker, a machine learning hub that offers algorithms, models and more.
Another great tool is coding companion CodeWhisperer, which Amazon said allowed developers to complete tasks 57% faster on average. Last year, Microsoft also reported a productivity boost from its coding partner, GitHub Copilot.
In June, AWS announced a “$100 million generation innovation center.”
“We have so many customers who say, ‘I want to do generative AI,’ but they don’t necessarily know what that means for them in the context of their own businesses. And so we are going to bring in architects’ solutions. and engineers and strategists and data scientists to work with one on one,” said AWS CEO Selipsky.
While AWS has so far focused heavily on tools rather than building a competitor to ChatGPT, a recently leaked internal email shows that Amazon CEO Andy Jassy, directly oversees a new team in the middle building out large, extensive language models as well.
In the second quarter earnings call, Jassy said that a “very significant amount” of AWS’s business is now driven by AI and the more than 20 machine learning services it offers. Examples of customers include Philips, 3M, Old Mutual and HSBC.
The explosive growth in AI has come with many security concerns from companies worried that employees are adding proprietary information to the training data used by large public language models.
“I can’t tell you how many Fortune 500 companies I’ve talked to that have banned ChatGPT. So with our approach to generative AI and our Bedrock service, whatever you do, whatever model you use through Bedrock will be in your own edge. virtual private cloud environment. It will be encrypted, it will have the same AWS access controls,” Selipsky said.
For now, Amazon is only accelerating its push into next-generation AI, telling CNBC that “more than 100,000” customers are using machine learning on AWS today. While that’s a small percentage of AWS’s millions of customers, analysts say that could change.
“What we’re not seeing is enterprises saying, ‘Oh, wait a minute, Microsoft is so advanced in next-generation AI, let’s go out and change our infrastructure strategies, migrate everything to Microsoft.’ Dekate said.
— CNBC’s Jordan Novet contributed to this report.
CORRECTION: This article has been updated to reflect Inferentia as the chip used for machine learning inference.