
Hello GPT-4o
We're announcing GPT-4 Omni, our new flagship model which can reason across audio, vision, and text in real time.
This isn't just better chatbots — GPT-4o can literally see your screen, hear your frustration, and respond instantly without the awkward text-to-speech delays


Tech Blogger Take
Someone just made every other LLM look like it's running on dial-up internet
Inception Labs dropped Mercury 2 yesterday, and the numbers are absolutely bonkers — 11,000 tokens per second on H100 GPUs. To put that in perspective, most production LLMs are crawling along at a few hundred tokens per second, optimized for accuracy while your users tap their fingers waiting. But here's the kicker: Mercury 2 uses diffusion models, the same tech that powers DALL-E's image generation, except applied to text. It's like someone took the wrong turn in the AI research lab and accidentally solved the speed problem everyone else ignored. If this actually works at production quality — and that's still a big if — we're looking at real-time AI conversations, dirt-cheap API costs, and suddenly every 'too slow for production' AI feature becomes viable. The entire inference cost equation just got flipped on its head.
Action
馬上試用Waiting 3-5 seconds for your LLM to generate a paragraph while users get impatient and API costs pile up
Getting instant AI responses that feel like talking to a human, with API costs that actually make sense for high-volume applications
AI Analysis
Cloud Computing & APIs
highStart benchmarking your current token costs against this 11K/sec baseline — if Mercury 2 delivers on production quality, your API bills are about to get slashed
While everyone's been chasing smarter models, Inception Labs just made the entire inference cost equation obsolete by borrowing image generation tech
Your customers expect instant responses, and you're paying premium prices for models that think too slowly — this could flip both problems overnight
Job Impact Analysis
DevOps Engineer
Role Shift11,000 tokens per second means your current scaling assumptions for LLM workloads just became ancient history
Download Mercury 2 and stress-test it against your production traffic patterns — if it holds up, your infrastructure costs are about to plummet
Product Manager
OpportunityReal-time AI features that were too expensive or slow before suddenly become viable product opportunities
Dust off those 'AI-powered live chat' and 'instant content generation' features you shelved — it's time to revisit the roadmap