AI Workflows · HR manager playbook · Updated June 2026

AI Performance Reviews: The Manager Workflow That Keeps You Accountable

Your managers are already drafting reviews with AI, whether you sanctioned it or not. HR inherits the bias, the inconsistency, and the documentation risk. Here is the workflow that keeps the human in charge.

How managers safely use AI to draft performance reviews: feed the model your own evidence and notes, have it organize and draft the narrative, then verify every claim, set the rating yourself, and own the document. AI structures what is true. The manager makes the judgment. Never let AI generate the score or invent accomplishments from thin inputs.

Key takeaways

The rating stays human, always. AI may organize evidence and draft the narrative, but a manager sets the rating and owns it. A score generated by a model is a documentation and fairness problem waiting to happen.
Evidence in, draft out. The reliable pattern is to give the model the manager's real notes and specifics, not to ask it to invent a review from a name and a number.
The danger is fabricated specifics. AI will happily invent accomplishments, metrics, and incidents that never happened. Every factual claim in an AI-drafted review gets verified before it reaches the employee.
Consistency cuts both ways. AI can make reviews more consistent in structure, which helps, but it can also smuggle in generic praise that hides real performance differences. Use it for structure, not for substance.

The problem HR already has

Performance review season arrives and managers are overloaded. Writing fifteen thoughtful reviews is hours of work most of them do not have, so the quiet shortcut has become obvious: paste a few bullet points into a chatbot and ask it to "write a performance review for a senior analyst." The output reads fluently, fills the form, and saves an evening. It is also, in that form, a real risk that lands on HR's desk.

The trouble is what the model does with thin inputs. Asked to write a review from a job title and a rating, it invents. It produces plausible accomplishments the employee never achieved, soft generic praise that flattens real differences between performers, and language that can carry subtle bias from its training data. None of that is the manager's honest assessment. It is a fluent average. When that document becomes the basis for a promotion, a raise, or eventually a termination, HR inherits a record that does not reflect reality and cannot be defended.

A review the manager did not really write is a record the company cannot really defend. Fluency is not the same as truth, and a performance file has to be true.

So the answer is not to ban AI from reviews, which will not work because managers will use it anyway. The answer is to give them a workflow that captures the genuine time savings, the drafting and structuring, while keeping the judgment, the evidence, and the rating firmly with the human who actually manages the person.

What AI may do, and what it may never do

The line in performance reviews
Task	AI role
Organizing the manager's notes and evidence	Yes, this is a strong use
Drafting the narrative from real specifics	Yes, with verification after
Tightening tone and removing vague filler	Yes, a genuine improvement
Suggesting clearer, more specific phrasing	Yes, the manager edits
Generating the rating or score	Never, the manager owns it
Inventing accomplishments or metrics	Never, this corrupts the record
Comparing or ranking employees	Never, a human judgment about people

Read the pattern. AI works on material the manager already has and assessments the manager has already made. The moment it starts producing the substance, the score, the comparison, the accomplishments, it has crossed from helpful tool to liability. That single line, organize what is true versus generate what should be judged, is the whole discipline.

The safe workflow, step by step

Run this with Claude or your sanctioned tool. The steps keep the manager as the author and the model as the assistant.

Step 1: The manager gathers real evidence first

Before the tool opens, the manager collects the genuine inputs: specific projects and outcomes, concrete examples of strengths and gaps, goals set at the start of the period, and any feedback gathered along the way. This is the manager's actual knowledge of the person, and it is the one thing the model cannot supply. A review built on real evidence is defensible. A review built on a chatbot's guesses is not.

Step 2: Feed evidence, ask for structure not judgment

Give the model the evidence and ask it to organize and draft, explicitly withholding the rating. Example prompt: "You are helping a manager draft a performance review. Here are my notes on this employee's work this period: [paste specific, factual notes with no confidential identifiers beyond what your policy allows]. Organize this into a clear review with sections for accomplishments, strengths, areas for development, and goals for next period. Use only the facts I gave you. Do not invent accomplishments, metrics, or examples, and do not suggest a rating or score."

Step 3: Pressure-test for fairness and vague filler

Use the model to improve the draft, then judge the result. Example prompt: "Review this draft for vague or generic praise that does not point to specific evidence, and flag any language that could read as biased or based on personality rather than performance. Suggest more specific, evidence-grounded phrasing. Do not add any new facts." This is where AI earns its place: it is good at catching the empty "great team player" filler that hides real signal.

Step 4: The manager verifies every claim and sets the rating

Now the human takes over for good. The manager reads every line and confirms each factual claim against what actually happened, deletes anything the model added that is not true, and writes the rating themselves based on their own judgment and the company's calibration standard. The rating is never the model's output. The manager also adds the things only a human knows: context, trajectory, and the honest developmental conversation the number alone cannot carry.

Step 5: Calibrate and document

Run the review through your normal calibration so ratings stay consistent across managers, a human process the model does not touch. Then keep a simple note that the narrative was AI-drafted and manager-verified, with the manager named as the author of record. That record is the difference between a defensible review and a vague "the system generated it." Save the prompt structure as a template so next cycle starts from your standard.

Paste-ready: review verification and bias-check checklist

Run every AI-drafted review through this before it reaches the employee. If any line fails, fix it before the review ships.

1. Every factual claim is true. Each accomplishment, metric, and example traces to something that actually happened. Anything the model added that you cannot confirm is deleted.

2. No invented specifics. No numbers, projects, or incidents appear that you did not provide.

3. Praise is grounded. No generic "great team player" filler. Every strength points to specific evidence.

4. No bias or personality language. Nothing reads as based on personality, demographics, or tone rather than performance.

5. The rating is yours. You set the score from your own judgment and the company's calibration standard. The model did not suggest it.

6. Human context is added. Trajectory, situation, and the honest developmental message the number alone cannot carry are in the review.

7. The record is documented. A short note states the narrative was AI-drafted and manager-verified, with you named as author of record.

Honest usage notes

The time savings are real and they live in the drafting, not the judging. Turning a manager's messy bullet points into a clean, well-structured narrative is genuinely faster with AI, and that is most of the pain in review season. The part that matters, the honest assessment and the rating, still takes human time and judgment, and it should. Managers who expect the model to deliver a finished review from a name and a number are the ones who generate fiction.

The bias question deserves a straight answer. AI does not remove bias from reviews; it can introduce its own, and it can launder a manager's bias into more polished language. Used well, for structure and to flag vague or personality-based phrasing, it can make reviews a little more consistent and a little more evidence-grounded. Used badly, to generate substance, it adds a new layer of risk on top of the old one. The tool is neutral. The workflow is what makes it safe. For the broader picture of where AI fits across the HR function, see our AI in HR workflow playbook.

Guardrails

Never let AI generate the rating

The score is a judgment about a person's career and the company's accountability for it. A manager sets it, owns it, and can explain it. A rating produced by a model from thin inputs is indefensible the moment it is challenged, and it will be challenged in any promotion dispute or termination.

Verify every specific before it reaches the employee

AI invents accomplishments, metrics, and incidents that sound plausible and never happened. Every factual claim in an AI-drafted review gets checked against reality. A review with a single fabricated achievement is worse than no review, because it destroys the credibility of the whole document.

Keep confidential and sensitive details handled per policy

Performance notes, health-related accommodations, and investigation records are sensitive. Follow your company's AI policy on what may be entered into a tool, and when in doubt, work with the structure rather than the raw confidential specifics. If you do not yet have that policy, our guide to writing an employee AI use policy shows how to build one.

How we built this workflow

This workflow reflects hands-on use of AI to draft and structure narrative documents from real source notes, where the reliable pattern is evidence-in, draft-out, human-verifies. The automate-versus-keep-human line, organize what is true but never generate the judgment, reflects the practical risk pattern in performance documentation, where fabricated specifics and model-generated ratings are the failure modes that create real exposure. We do not publish invented statistics or fabricated outcomes. Where a use touches employment law or accommodation, confirm against current guidance and counsel.

What to do this cycle

You do not need to roll out a tool to make this safe. You need to give managers one clear rule before the next review cycle: AI may draft from your real evidence, but you set the rating and you verify every word. Pair that with the five-step workflow and a short note in each file that the review was AI-drafted and manager-verified. You will get most of the time savings managers are already chasing, without the fabricated-record risk they are quietly creating.

That discipline, AI organizes and a human judges, is the whole game in performance management. Get it right and review season gets lighter without a single indefensible file. Get it wrong, by letting the model generate the substance, and you trade a few saved evenings for a record that cannot survive a challenge.

Part of TLY's AI Workflows → workflow playbooks for senior professionals.

Frequently asked questions

Can AI write a performance review?

AI can draft and structure the narrative of a review from a manager's real evidence and notes, which saves genuine time. It should not generate the rating, invent accomplishments, or produce the assessment from thin inputs like a job title and a score. The safe pattern is evidence-in, draft-out, with the manager verifying every claim, setting the rating themselves, and owning the document.

Is it safe to use ChatGPT or Claude for performance reviews?

It is safe for drafting and structuring, and unsafe for judging. Use the tool to organize the manager's notes into a clear narrative and to flag vague or biased phrasing. Do not use it to generate the rating, compare employees, or invent specifics. Follow your company's AI policy on what confidential information may be entered, and verify every factual claim before the review reaches the employee.

Does AI make performance reviews biased?

AI does not remove bias and can add its own, including biased phrasing absorbed from training data and polished language that launders a manager's existing bias. Used carefully, to structure the review and to flag vague or personality-based statements, it can make reviews slightly more consistent and evidence-grounded. The protection is the workflow: real evidence in, human verification and rating, and a calibration step, not the tool itself.

What part of a review must stay human?

The rating, the comparison or ranking of employees, the honest developmental judgment, and the verification of every factual claim. AI may organize evidence and draft language, but the assessment of a person's performance and the score that follows are human judgments the manager must own and be able to explain. A model-generated rating is indefensible when challenged.

How do I document an AI-assisted review?

Keep a short note in the file that the review narrative was AI-drafted and manager-verified, with the manager named as the author of record. Run the rating through your normal calibration process, which AI does not touch, so scores stay consistent across managers. This record is what makes the review defensible, turning "the system wrote it" into "a named manager assessed and owns this."

Build the habit, not just the shortcut

Knowing that AI should draft while a manager judges is the easy part. Building it into how an entire org runs reviews, with the prompts, the verification step, the calibration, and the documentation that make it defensible, is the skill. That is what we teach: a practical system for putting AI to work across HR without ever handing it a judgment about a person.

Go deeper with The Leveraged HR Professional course Join The Leverage Club for $49 and get the review prompts, verification checklist, and calibration guide Not sure where to start? Take the 2-minute course finder

Sources: TLY hands-on use of AI to draft and structure narrative documents from source notes (June 2026); general guidance on bias and documentation risk in performance management, confirmed against current practice. Uses touching employment law or accommodation vary by jurisdiction; verify against current guidance and counsel before relying on this page.