Pull down to go back
MiniMax2.7 Local Results on Terminal Bench. Dud. Anyone using this for agent coding in Claude?

MiniMax2.7 Local Results on Terminal Bench. Dud. Anyone using this for agent coding in Claude?

MiniMax2.7 在終端機基準測試上的本地結果。表現不佳。有人在 Claude 中用這個進行代理程式編碼嗎?

I just finished a full Terminal-Bench 2.0 run (445 trials) with MiniMax-M2.7 (Q8_0, unsloth GGUF) running locally on a Mac Studio M3 Ultra with 512GB unified memory. The result: 41.3% mean — which is actually worse than the 42.7% I got with M2.5 on the same hardware and config. The numbers: 434 trials, 184 solved, 250 failed 198 errors — 187 of those were AgentTimeoutError (the model running out of clock, not crashing) Mean reward: 0.413 10-17 tokens/second For comparison, M2.5 on the