FP8 DeepSeek R1 Distilled LLMs for SGLang and VLLM

jaaron 19 hours ago

These are the FP8 distilled versions of DeepSeek we've started testing at Jam & Tea.

We use LLMs for real-time gameplay. We released Retail Mage last year on Steam as a tech demo of what we can do. We've found FP8 to be the current sweet spot for accuracy and performance for our use case. It's one of the many techniques we applied last year to bring our real-time inference costs down by 3 orders of magnitude to make Retail Mage releasable.

Anyway, we're just sharing this with anyone who finds it useful.

You can read a few more note by our ML engineer, Yudi, on LinkedIn here:

https://www.linkedin.com/feed/update/urn:li:activity:7290455...