Apple adopts Nvidia GPUs to accelerate LLM inference through its open source ReDrafter technology


  • ReDrafter delivers 2.7x more tokens per second compared to traditional auto-regression
  • ReDrafter could reduce latency for users by using fewer GPUs
  • Apple has not said when ReDrafter will be implemented on rival AI GPUs from AMD and Intel.

Apple has announced a collaboration with Nvidia to accelerate large language model inference using its open source technology, Recurrent Drafter (or ReDrafter for short).

The partnership aims to address the computational challenges of autoregressive token generation, which is crucial to improving efficiency and reducing latency in real-time LLM applications.

Leave a Comment

Your email address will not be published. Required fields are marked *