下に引いて戻る
MiniMax2.7 Local Results on Terminal Bench. Dud. Anyone using this for agent coding in Claude?

MiniMax2.7 Local Results on Terminal Bench. Dud. Anyone using this for agent coding in Claude?

MiniMax2.7 Local Results on Terminal Bench. Dud. Anyone using this for agent coding in Claude?

I just finished a full Terminal-Bench 2.0 run (445 trials) with MiniMax-M2.7 (Q8_0, unsloth GGUF) running locally on a Mac Studio M3 Ultra with 512GB unified memory. The result: 41.3% mean — which is actually worse than the 42.7% I got with M2.5 on the same hardware and config. The numbers: 434 trials, 184 solved, 250 failed 198 errors — 187 of those were AgentTimeoutError (the model running out of clock, not crashing) Mean reward: 0.413 10-17 tokens/second For comparison, M2.5 on the