Prompt Engineering Is Mostly Not Engineering

Every other LinkedIn profile says "prompt engineer" now. I'm not sure most of them could explain what that means.

Prompting is real. Getting an LLM to do what you want takes skill. But calling it engineering implies a rigor that doesn't usually exist.

What engineering means

Engineering has testable hypotheses. Repeatable experiments. Clear metrics. You change one thing, measure the result, draw conclusions.

Most prompting I see is: try something, eyeball the output, decide it's better or worse, ship it. No baseline. No systematic comparison. No regression testing.

That's not engineering. That's vibes.

Why this matters

LLMs are non-deterministic. The same prompt can give different outputs. If you're comparing prompts by running each one twice and picking the one that "seems better," you're not learning anything reliable.

Prompts also interact with the model. A prompt that works great on GPT-4 might fail on Claude. A prompt that worked in March might stop working after a model update. If you don't have a test suite, you don't know when things break.

What actual prompt engineering looks like

You define success criteria before you start. Not "good responses" but specific, measurable outcomes. Accuracy on a test set. Format compliance rate. Latency.

You build an evaluation dataset. Real examples. Edge cases. Failure modes you've seen before.

You test systematically. Change one thing at a time. Run against your eval set. Record the results.

You version control your prompts. You track what changed and when. You can roll back if something breaks.

Most teams don't do this. They should.

The reality

Prompting well is a skill. It requires understanding how models work, what they're good at, where they fail. But skill isn't the same as engineering.

Until you have tests, metrics, and systematic experimentation, you're just guessing. Sometimes you guess well. But you can't explain why, and you can't reliably repeat it.