opinionsreddit2026年4月21日上午09:23

Qwen3.6 35B MoE 在 8GB VRAM 上運行 — 可行的 llama-server 配置 + 我遇到的 max_tokens / thinking 陷阱

Qwen3.6 35B MoE on 8GB VRAM — working llama-server config + a max_tokens / thinking trap I ran into

Hi all, I wanted to share a setup that’s working for me with Qwen3.6-35B-A3B on a laptop RTX 4060 (8GB VRAM) + 96GB RAM. This is not an interactive chat setup. I’m using it as a coding subagent inside an agentic pipeline, so some of the choices below are specific to that use case. TL;DR - Qwen3.6 35B A3B runs fine on 8GB VRAM + RAM as coding subagent - my real bug was not a crash: unlimited thinking consumed the whole max_tokens budget - disabling thinking fixed it - better fix: use per

閱讀原文 →

相關報導

OpenAI 直播活動

ChatGPT 圖像 2.0 來了，生成圖片的能力大升級

「再等六個月就會變好」的說法只撐過一輪就破功了

Mistral Medium 3.5 在 AMD Strix Halo 上跑起來超慢，準備好熬夜吧