โ—† The Leverage Club is open ยท free with any course
Accuracy and Review

An audit found every AI scribe made errors. Here is the 90 second check.

When Ontario tested 20 AI scribe systems, all 20 produced inaccuracies, including invented orders and wrong drugs. This is the fast read-back protocol that catches the three failure modes before you sign.

Key Takeaways

  • The finding is stark: on May 12 to 14, 2026, Ontario's Auditor General reported that all 20 AI scribe systems it procurement-tested showed inaccuracies, despite roughly 5,000 Ontario physicians already using them.
  • The three failure modes: 9 systems hallucinated, including fabricated referrals or blood-test orders the doctor never made; 12 captured the wrong drug; and 17 missed key mental-health details.
  • This is a Canadian audit, not US law: treat it as a universal accuracy warning about how these tools fail, not a regulation you must follow.
  • The fix is a 90 second read-back: a short, targeted review aimed squarely at wrong drug, phantom order, and dropped mental-health detail catches the exact errors the audit found before your signature goes on the note.

The Leveraged Years Briefing. Permalink

What the Ontario audit actually found

This one deserves attention because it was not a vendor demo or a press release. It was an independent government audit, and the result was blunt.

On May 12 to 14, 2026, Ontario's Auditor General reported on AI scribe systems used in the province's healthcare system. Across 20 systems put through procurement testing, every single one showed inaccuracies. Not most. All twenty. And this is not a fringe tool: roughly 5,000 Ontario physicians were already using these systems when the audit ran.

To be clear about scope, this is a Canadian audit. It is not US law and it does not bind your practice. But accuracy is not a jurisdiction. The way these tools fail in Ontario is the way they fail everywhere, because it is baked into how the underlying technology works. So read this as a warning about the tool category, not a foreign regulation you can ignore.

The three ways the notes went wrong

The audit did not just say errors happened. It found patterns, and the patterns are specific enough to defend against. Three failure modes stood out.

Notice the shape of these. They are not random typos. They are confident, plausible-looking errors: an order that reads like a real order, a drug name that looks right, a clean note that is clean because something important fell out of it. That is exactly the kind of error a quick glance misses, which is why a glance is not enough.

There is a reason these three failure modes cluster the way they do. Drug names are short, similar-sounding, and spoken fast in a real visit, which is hard for any transcription system to get right. Orders and referrals are easy for a fluent model to invent because they fit the expected pattern of a note, so a plausible one slips in without a trigger in the audio. And mental-health detail is exactly the kind of nuanced, lightly-stated, easily-paraphrased content that gets smoothed away when a model compresses a conversation into a tidy note. Knowing why each one happens tells you precisely where to point your attention.

The 90 second read-back protocol

You do not need to re-document the visit to catch these. You need a fast, targeted read-back aimed at the three failure modes the audit found. Do it before you sign, every time. It runs in about 90 seconds.

Then read the note once, top to bottom, for anything that simply does not match your memory of the visit. Then sign.

Why this is different from the consent question

A boundary worth drawing. This protocol is about accuracy: making sure the note is true before you attest to it. That is a separate problem from whether your patient knows or agreed to the AI listening, which is consent. If consent is your concern, the briefing on when AI scribes need patient consent covers that directly. Here we are only catching errors. The companion briefing on how doctors use AI for clinical notes safely gives the broader safe-use habit that this 90 second check slots into.

The honest limit of these tools

The useful thing about the Ontario audit is that it kills a comfortable assumption: that the good scribes are accurate and only the cheap ones fail. All 20 failed. That means accuracy review is not a vendor-selection problem you can buy your way out of. It is a permanent part of using the category.

This is not an argument against AI scribes. They save real time, and the time saved is worth keeping. It is an argument that the time saved on drafting has to be partly reinvested in review. The 90 seconds you spend on the read-back is the cheapest insurance you will buy all day.

The skill under the tool

The pattern across every briefing in this series is the same. The tool gets faster, more fluent, more confident, and none of that makes it more accurate. What protects the patient and your license is a human who reads what the machine produced with a trained, skeptical eye and knows exactly where these tools tend to break.

That skill is learnable and it does not expire when the next scribe version ships. AI for Physician Notes builds the verify-before-you-sign workflow, including the targeted read-back this audit calls for, and the two minute course quiz will point you to the right starting place for your practice.

Frequently Asked Questions

How bad were the AI scribe errors in the Ontario audit?

Significant and universal. On May 12 to 14, 2026, Ontario's Auditor General reported that all 20 procurement-tested AI scribe systems showed inaccuracies. Specifically, 9 hallucinated content like fabricated referrals or blood-test orders, 12 captured the wrong drug, and 17 missed key mental-health details, even though roughly 5,000 Ontario physicians were already using them.

Does this Ontario audit apply to me if I practice in the US?

It is a Canadian audit, so it is not US law and does not bind your practice. But it is a credible, independent warning about how AI scribes fail as a tool category, and those failure modes are not specific to one country. Treat it as a universal accuracy warning worth acting on regardless of where you practice.

What is the fastest way to catch these errors before I sign?

A targeted read-back of about 90 seconds aimed at the three failure modes. Check every drug name against what you actually prescribed, scan for any referral or order and confirm you made it, and verify any mental-health detail is captured accurately. Then read the note once for anything that does not match the visit, and sign.

Is this briefing legal or medical advice?

No. The Leveraged Years is an education company, not a law or medical firm. This is a plain summary of an independent audit and a practical review habit, and tools and findings can change. Treat it as background, and confirm anything affecting your documentation, patient safety, or liability with a qualified professional.