Strategy

Judgment engineering, not prompt engineering

If you're over 40 and already good at hard decisions, your edge with AI isn't knowing magic words. It's spelling out the judgment the model can't see. Judgment engineering means taking the criteria, constraints, and standards that live in your head and making them explicit enough that Claude can reason with them. That's the part no prompt library can fake.

Key Takeaways

  • The core idea: Generic prompts produce generic work. Feeding AI the context, constraints, and standards you already carry in your head produces output you can defend to clients.
  • Why it matters: The scarce resource is not prompt syntax but professional judgment: the politics, failure modes, and tacit knowledge accumulated over decades that juniors cannot replicate.
  • How it works: Treating AI as a capable junior rather than an autonomous partner unlocks its value. Supply the real chessboard, not the sanitized version, and reasoning improves dramatically.
  • What to do: Four explicit inputs, criteria, context, constraints, and standards, consistently turn mediocre AI drafts into work requiring minimal revision. Writing them down matters more than keeping them mental.

Source: The Leveraged Years Briefing. Permalink

Post illustration

Stop chasing prompt tricks. The models are now good enough that a plain request framed around a well-shaped decision beats a "perfect" prompt stapled to a fuzzy problem. What determines whether you'd sign your name under Claude's output is the quality of judgment you feed it: what good looks like, what's off-limits, what the politics are, which failure modes you flatly refuse to tolerate. None of that is prompt engineering. It's the work you've been doing since before anyone mentioned tokens.

Here's the shift. A prompt is the wrapper; judgment is the substance. Prompt engineering polishes the request. Judgment engineering defines the decision. Spend an afternoon learning basic Claude patterns and you'll save some keystrokes. Spend that same afternoon getting painfully precise about the answer you actually need, and every serious task you run through AI gets better. One is decoration. The other changes the outcome.

What's the difference, in real work?

Take a request I watch fail constantly: "Write a proposal for a manufacturing client." The prompt-focused version adds a persona, a tone, "think step by step," maybe a couple of examples from past wins. It produces something that looks fine, reads smoothly, and could have been written for any client on earth. The text is polished. The thinking is generic.

Now the same request, judgment-engineered. Plain language, but packed with what actually matters: "This client fired their last vendor for slipping deadlines and over-promising throughput, so credibility beats ambition. The buyer is a COO who distrusts consultants and respects people who've run plants. We win on implementation, not strategy, so lead with that and drop the 'transformation' language. Our price will be about 20% above the likely competitor; justify the gap through reduced downtime and fewer changeovers. Don't propose anything we can't staff by Q3, and assume the plant union will scrutinize any workforce change."

That's not a cleverer prompt; it's putting judgment on the table. Claude can now reason with something real: trade-offs, politics, staffing, buyer psychology. The same model that produced harmless oatmeal from the first request drafts something you'd actually send, because you handed it the pieces you normally juggle in your head. The model was never the ceiling. The missing judgment was.

If you're 45 to 62, this is where you lean forward. The scarce input here isn't syntax. It's the tacit stuff you already own: where this client got burned, what "good enough" really means in your practice, which quiet sins get you uninvited next quarter. A 26-year-old who memorized every prompt pattern can't supply that, because they haven't watched deals die over it. Your experience isn't being displaced by AI. It's the raw material the AI is starving for.

Treat Claude like a sharp junior, not a partner

The most useful mental model: Claude is a permanently available, reasonably bright junior who doesn't know your world until you teach it. Treat it like a partner and you'll be disappointed. Treat it like a junior and it gets terrifyingly useful. Ask it to "draft a change management plan" and you'll get the textbook answer. Ask instead: "Draft a change plan for a 600-person financial-services firm that has already failed two CRM rollouts. The COO has publicly said this is the last attempt. Front-line advisors are paid on volume and see the CRM as overhead. We have four months, a three-person project team, and we can't touch comp. The plan has to be something a skeptical line manager would call 'this might actually work.'" Now you're giving Claude the real chessboard, not the brochure version, and it can help you think through options, risks, and messaging because of it.

The four inputs of judgment engineering

When Claude hands you something flat, it's rarely because the model "isn't smart enough." It's because one of four judgment inputs never made it into the request. Run them as a checklist.

I keep those four on a sticky note next to my screen. For a month I told myself I had them "in my head" and skipped writing them out. I was wrong. The correlation between "I sketched the four inputs in 60 seconds" and "I barely had to edit the output" was almost embarrassing. Prompt polish moved the needle a little. Explicit judgment moved it a lot.

The Four-Input Pass (and the "could I defend this?" test)

The framework fits on a Post-it: before you hit enter on any serious request to Claude, write one line for each input, one sentence on what success looks like, one on the history or politics that matter, one on the hard limits, one on the standard it'll be judged against. Sixty seconds. Then apply the test that tells you whether it worked: could you defend this output, line by line, to a sharp peer who knows the domain and doesn't care that "the AI wrote it"? If yes, you supplied enough up front; the bones are sound and you're only tweaking language. If your honest answer is "I'd blame the model if this backfired," you offloaded judgment instead of engineering it, and Claude filled the gaps with training-data averages. The fault there isn't the model's; it's the brief.

Prompt engineeringJudgment engineering
OptimizesHow the request is wordedWhat the model is asked to reason about
Scarce inputKnowledge of prompt patternsExperience, tacit knowledge, standards
Who can do itAnyone who skims prompt guidesPeople with real domain judgment
Fixes a bad output byRephrasing, adding tricksClarifying criteria, context, constraints, standards
Breaks whenThe model's quirks changeRarely; good judgment ages slowly

Does this mean prompt skills are worthless?

No, but they're table stakes, not an edge. Basic clarity, giving Claude an example, asking it to show its reasoning so you can check the logic: useful, learnable in an afternoon, then move on. The mistake is treating that afternoon's worth of technique as the whole skill. It's the easy 10%. The hard, durable 90% is knowing what to ask about, what good looks like, and which constraints are load-bearing, and that doesn't come from a prompt library. It comes from having done the work for twenty years.

So the next time Claude gives you something flat, resist the reflex to fiddle with wording. Ask which of the four you withheld: Did I define good? Did I give it what I know? Did I name the hard limits and the standard? Nine times in ten the fix isn't a cleverer prompt. It's the judgment you forgot to write down. Build that habit and you stop renting the model's generic judgment and start scaling your own, which is the only version of AI leverage that's actually yours to keep.

Frequently Asked Questions

Why does my AI output still feel generic even after I refine the prompt?

You are likely polishing the request without feeding in the judgment layer: the client history, political constraints, and specific success criteria that distinguish your work from a textbook answer. Write one sentence each for criteria, context, constraints, and standards before you submit the request.

How do I know if I have supplied enough judgment to the model?

Apply the defense test: could you justify each line of the output to a knowledgeable peer without hiding behind 'the AI wrote it'? If not, you omitted context or standards the model needed to reason at your level.

Is learning advanced prompt techniques worth the time investment?

Basic patterns save keystrokes; an afternoon is sufficient. The higher return comes from articulating the tacit knowledge you already own so the model can apply it consistently across every task you delegate.