AWS and Cerebras Partner to Deliver Faster AI Inference on Amazon Bedrock

Written by: Mane Sachin

Published on:

Follow Us

Amazon Web Services (AWS) has announced a new partnership with Cerebras Systems to introduce an advanced AI inference solution expected to roll out within the next few months.

The upcoming system will combine AWS’s Trainium chips with Cerebras’s CS-3 hardware, and it will be delivered through Amazon Bedrock inside AWS data centres.

According to the companies, the new architecture separates AI inference tasks into two different stages. One stage processes the input prompt, while the second stage focuses on generating the response tokens. Each phase will run on hardware specifically suited for that task.

In this setup, AWS Trainium chips will handle prompt processing, while the Cerebras CS-3 system will generate the output tokens. Both systems will communicate using AWS’s Elastic Fabric Adapter (EFA) networking technology.

David Brown, Vice President of Compute and Machine Learning Services at AWS, said inference is the stage where AI delivers the most value to users, but performance limitations still slow down demanding applications such as real-time coding tools and interactive AI systems.

He explained that the collaboration with Cerebras aims to address that issue by distributing inference tasks across two specialized systems connected through high-speed networking. This design allows each system to focus on the work it handles best, which could deliver inference speeds up to ten times faster than current solutions.

The technology relies on a method known as inference disaggregation, which divides inference into two phases: prefill and decode. The prefill stage processes the input prompt and requires heavy parallel computation, while the decode stage generates tokens sequentially and depends more on memory bandwidth.

AWS said assigning the prefill stage to Trainium chips and the decode stage to the Cerebras CS-3 enables both architectures to operate more efficiently based on their strengths.

Andrew Feldman, founder and CEO of Cerebras Systems, said the partnership will help bring high-speed AI inference to organizations around the world. He noted that enterprises will be able to access this performance improvement while continuing to operate within their existing AWS environments.

The system will run on AWS infrastructure powered by the Nitro System, ensuring the same security standards and operational framework used across AWS services.

AWS also said that in the future it plans to offer open-source large language models and its Amazon Nova models on Cerebras hardware through Amazon Bedrock.

Trainium is AWS’s custom AI chip designed for both model training and inference tasks. The company said several AI labs, including Anthropic and OpenAI, already use Trainium capacity within AWS infrastructure.

Meanwhile, Cerebras’s CS-3 system is currently used by companies such as OpenAI, Cognition, and Mistral for AI workloads including code generation and reasoning models.

Also Read: AWS Launches AI Agent to Automate Healthcare Admin Tasks

Mane Sachin

My name is Sachin Mane, and I’m the founder and writer of AI Hub Blog. I’m passionate about exploring the latest AI news, trends, and innovations in Artificial Intelligence, Machine Learning, Robotics, and digital technology. Through AI Hub Blog, I aim to provide readers with valuable insights on the most recent AI tools, advancements, and developments.

For Feedback - aihubblog@gmail.com