
OpenAI 直播活動
OpenAI 將舉辦一場直播活動。在直播期間將揭露具體的公告、產品發布或示範內容。
上一次 OpenAI 突然搞直播,他們直接丟出 GPT-4 Turbo,然後一夜之間改掉所有定價

I always thought with 32GB of VRAM, the biggest models I could run were around 20GB, like Qwen3.5 27B Q4 or Q6. I had an impression that everything had to fit in VRAM or I'd get 2 t/s. Man was I wrong. I just tested Qwen3.6 Q8 with 256k context on llama.cpp, with `--fit` on, the weights alone are bigger than my VRAM, and my 5090 is hooked up via Oculink, but I’m still getting 57 t/s! This is literally magic. If you’ve been stuck in the same boat as me thinking it’s all VRAM or nothing, you shou