
OpenAI 直播活動
OpenAI 將舉辦一場直播活動。在直播期間將揭露具體的公告、產品發布或示範內容。
上一次 OpenAI 突然搞直播,他們直接丟出 GPT-4 Turbo,然後一夜之間改掉所有定價

More info, including charts, per-case metrics, raw judge outputs, and the parsed answer dump: https://github.com/lechmazur/position_bias This benchmark isolates one basic and frustrating failure mode. The model-average first-shown pick rate is 63%. GPT-5.4 (high) is the most position-sensitive model in the run. Many models don't just pick the first story more often, they also rate it higher. Average first-position rating bonus is +0.26 on a 1-7 scale. Mistral Large 3 is the outlier in the op