The Reasoning Models Race Is Getting Weird

A year ago, chain-of-thought was a prompting technique. Now it's a product category.

OpenAI has o1. Anthropic has extended thinking. Google has... whatever they're calling it this week. Everyone's racing to build models that "think" before they answer.

I've been using all of them. Here's what I've noticed.

What reasoning models actually do

They generate intermediate steps before giving you an answer. Sometimes you see those steps. Sometimes you don't.

The idea is simple: harder problems need more thinking. A model that works through the problem step by step makes fewer errors than one that jumps straight to the answer. This works. On math, coding, and logic problems, reasoning models beat their base versions by a lot.

The weird part is how much variation there is in execution.

The approaches diverge

OpenAI hides the chain of thought. You get the answer. The reasoning happens somewhere you can't see. They say this is for safety. Maybe. It also means you can't debug it when something goes wrong.

Anthropic shows you the thinking, at least partially. You can watch the model work through the problem. When it makes a mistake, you can see where.

Google changes their approach every few months, so I've stopped trying to keep track.

Where it matters

Reasoning models shine on problems with clear right answers. Math proofs. Code that needs to compile. Logic puzzles.

They're less obviously better on fuzzy problems. "Write me a marketing email" doesn't have intermediate steps the same way "solve this equation" does.

I've found them most useful for code review. Point a reasoning model at a pull request and ask what could go wrong. It catches things I miss. The extended thinking time is worth it.

The cost question

Reasoning takes tokens. Lots of them. The thinking isn't free.

For one-off complex problems, this is fine. For high-volume production use, you're paying 5-10x more per request. That changes the math on a lot of applications.

What I expect next

Reasoning will get faster and cheaper. It always does.

The more interesting question is whether the thinking becomes something you can inspect and trust, or stays a black box. Right now we're heading toward black boxes. I hope that changes.