On June 10, OpenAI slashed the checklist worth of its flagship reasoning mannequin, o3, by roughly 80% from $10 per million enter tokens and $40 per million output tokens to $2 and $8, respectively. API resellers reacted instantly: Cursor now counts one o3 request the identical as a GPT-4o name, and Windsurf lowered the “o3-reasoning” tier to a single credit score as effectively. For Cursor customers, that’s a ten-fold value minimize in a single day.
Latency improved in parallel. OpenAI hasn’t revealed new latency metrics; third-party dashboards nonetheless see time to first token (TTFT) within the 15s to 20s vary for lengthy prompts. Due to recent Nvidia GB200 clusters and a revamped scheduler that shards lengthy prompts throughout extra GPUs, o3 feels snappier in actual use. o3 continues to be slower than light-weight fashions, however now not coffee-break gradual.
Claude 4 is quick but sloppy
A lot of the neighborhood’s oxygen has gone to Claude 4. It’s undeniably fast, and its 200k context window feels luxurious. But, in day-to-day coding, I, together with many Reddit and Discord posters, maintain tripping over Claude’s motion bias: It fortunately invents stubbed capabilities as an alternative of actual implementations, fakes unit checks, or rewrites mocks that have been informed to depart alone. The pace is nice; the follow-through usually isn’t.