opinionsreddit2026年4月21日上午12:44

Qwen3.5-27B 在 RTX 5090 上透過 vLLM 提供服務，達到 77 tps 的吞吐量

Qwen3.5-27B on RTX 5090 served via vLLM @ 77 tps

After maxing out my cursor $20 sub and zai $10 sub for this month, I have resorted to a local llm setup. Got good outcome on RTX5090 running Qwen3.5 27B and achieved very good tps. Context window at 218k. It can even run 2 concurrent sessions with this config although per session speed drops as expected. For some reason i can't get it to work at full context window of 256k on vllm 0.19, it works on vllm 0.17 per the guide below but tps suffers as 0.17 doesn't have many of the optimization that v

閱讀原文 →

相關報導

OpenAI 直播活動

ChatGPT 圖像 2.0 來了，生成圖片的能力大升級

「再等六個月就會變好」的說法只撐過一輪就破功了

Mistral Medium 3.5 在 AMD Strix Halo 上跑起來超慢，準備好熬夜吧