opinionsreddit2026年4月21日下午06:07

Llama.cpp 的自動適配功能表現遠超預期

Llama.cpp's auto fit works much better than I expected

I always thought with 32GB of VRAM, the biggest models I could run were around 20GB, like Qwen3.5 27B Q4 or Q6. I had an impression that everything had to fit in VRAM or I'd get 2 t/s. Man was I wrong. I just tested Qwen3.6 Q8 with 256k context on llama.cpp, with `--fit` on, the weights alone are bigger than my VRAM, and my 5090 is hooked up via Oculink, but I’m still getting 57 t/s! This is literally magic. If you’ve been stuck in the same boat as me thinking it’s all VRAM or nothing, you shou

閱讀原文 →

相關報導

OpenAI 直播活動

ChatGPT 圖像 2.0 來了，生成圖片的能力大升級

「再等六個月就會變好」的說法只撐過一輪就破功了

Mistral Medium 3.5 在 AMD Strix Halo 上跑起來超慢，準備好熬夜吧