Llama 3.1 405B runs at nearly a thousand tokens a second on Cerebras Inference, and took a quarter of a second to get the first token.
© Cerebras
Llama 3.1 405B runs at nearly a thousand tokens a second on Cerebras Inference, and took a quarter of a second to get the first token.
© Cerebras