Out of Fireworks and into the hearth
Nonetheless, my begin with Cerebras’s hosted Qwen was not the identical as what I skilled (for lots extra money) on Fireworks, one other supplier. Initially, Cerebras’s Qwen didn’t even work in my CLI. It additionally didn’t appear to work in Roo Code or every other software I knew how one can use. After taking a bug report, Cerebras instructed me it was my code. My similar CLI that labored on Fireworks, for Claude, for GPT-4.1 and GPT-5, for o3, for Qwen hosted by Qwen/Alibaba was at fault, stated Cerebras. To be truthful, my log did embrace misleading artifacts when Cerebras fragmented the stream, placing out stream elements as messages (which Cerebras nonetheless does occasionally). Nonetheless, this has been typically their strategy. Don’t repair their so-called OpenAI compatibility—blame and/or adapt the consumer. I took the problem and tailored my CLI, nevertheless it was quite a lot of workarounds. This was a large distinction with Fireworks. I had points with Fireworks when it began and confirmed them my debug output; they instantly acknowledged the issue (sometimes it might spit out corrupt, native software calls as an alternative of OpenAI-style output) and glued it in a single day. Cerebras repeatedly claimed their infrastructure was working completely and requests had been all profitable—in direct contradiction to most commentary on their Discord.
Feeling like I had lastly cracked the nut after three weeks of on-and-off testing and adapting, I grabbed a second Cerebras Code Max account when the window opened once more. This was after discovering that for a part of the time, Cerebras had charged me for a Max account however given me a Professional account. They fastened it and provided no compensation for the times my service was set to Professional, not Max, and it’s tough to show as a result of their analytics console is damaged, partly as a result of it gives measurements in native time, however the limits are in UTC.
Then I did the mathematics. One Cerebras Code Max account is restricted to 120 million tokens per day at a price equal to 4 instances that of a Cerebras Code Professional account. The Professional account is 24 million tokens per day. If you happen to multiply that by 4, you get 96 million tokens. Nonetheless, the Professional account is restricted to 300k tokens per minute, in comparison with 400k for the Max. Utilizing Cerebras is a bit irritating. For 10 to twenty seconds, it actually flies, then you definitely hit the cap on tokens per minute, and it throws 429 errors (too many requests) till the minute is up. In case your coding software is sensible, it’s going to simply retry with an exponential back-off. If not, it’s going to break the stream. So, had I purchased 4 Professional accounts, I might have had 1,200,000 TPM in principle, a a lot better worth than the Max account.