opinionsredditApril 20, 2026 at 10:29 PM

MiniMax2.7 Local Results on Terminal Bench. Dud. Anyone using this for agent coding in Claude?

MiniMax2.7 在終端機基準測試上的本地結果。表現不佳。有人在 Claude 中用這個進行代理程式編碼嗎？

I just finished a full Terminal-Bench 2.0 run (445 trials) with MiniMax-M2.7 (Q8_0, unsloth GGUF) running locally on a Mac Studio M3 Ultra with 512GB unified memory. The result: 41.3% mean — which is actually worse than the 42.7% I got with M2.5 on the same hardware and config. The numbers: 434 trials, 184 solved, 250 failed 198 errors — 187 of those were AgentTimeoutError (the model running out of clock, not crashing) Mean reward: 0.413 10-17 tokens/second For comparison, M2.5 on the

Read original →

Related Articles

OpenAI Livestream

ChatGPT Images 2.0

The "just wait 6 months" argument from 2025 survived exactly one iteration

Mistral Medium 3.5 on AMD Strix Halo: Painfully Slow (Plan for Overnight Runs)