opinionsreddit2026년 4월 21일 오전 09:23

Qwen3.6 35B MoE on 8GB VRAM — working llama-server config + a max_tokens / thinking trap I ran into

Hi all, I wanted to share a setup that’s working for me with Qwen3.6-35B-A3B on a laptop RTX 4060 (8GB VRAM) + 96GB RAM. This is not an interactive chat setup. I’m using it as a coding subagent inside an agentic pipeline, so some of the choices below are specific to that use case. TL;DR - Qwen3.6 35B A3B runs fine on 8GB VRAM + RAM as coding subagent - my real bug was not a crash: unlimited thinking consumed the whole max_tokens budget - disabling thinking fixed it - better fix: use per

원문 보기 →

관련 기사

OpenAI 라이브스트림

ChatGPT Images 2.0 출시, 이미지 생성 기능 대폭 업그레이드

2025년의 "6개월만 더 기다려" 주장이 단 한 번의 업데이트로 무너졌다

AMD Strix Halo에서 Mistral Medium 3.5 돌려봤더니 느려 죽겠네—밤새 돌려야 함