
OpenAI 라이브스트림
OpenAI가 라이브스트림 이벤트를 개최합니다. 방송 중에 구체적인 발표, 신제품 출시 또는 시연이 공개될 예정입니다.
The last time OpenAI did an unannounced livestream, they dropped GPT-4 Turbo and changed pricing overnight

I always thought with 32GB of VRAM, the biggest models I could run were around 20GB, like Qwen3.5 27B Q4 or Q6. I had an impression that everything had to fit in VRAM or I'd get 2 t/s. Man was I wrong. I just tested Qwen3.6 Q8 with 256k context on llama.cpp, with `--fit` on, the weights alone are bigger than my VRAM, and my 5090 is hooked up via Oculink, but I’m still getting 57 t/s! This is literally magic. If you’ve been stuck in the same boat as me thinking it’s all VRAM or nothing, you shou