AI Workflows / Field note

How Health Systems Are Measuring Ambient AI Scribe ROI: What Kaiser, Sutter and Mass General Found

Ambient AI scribes are the most widely deployed generative AI tool in medicine, and named health systems are now publishing hard numbers on time, burnout and revenue. Here is how leaders define the return, where it shows up, and where the evidence still has holes.

By Anthony Guerriero · Reviewed by The Leveraged Years Editorial Desk · Published June 28, 2026

How Health Systems Are Measuring Ambient AI Scribe ROI: What Kaiser, Sutter and Mass General Found

The Leveraged Years AI Workflows

Health systems measure ambient AI scribe ROI across three buckets: clinician time and documentation burden (note time, after-hours "pajama time," total EHR time); clinician well-being and retention (burnout and satisfaction); and direct financial productivity (work relative value units, visit volume, and evaluation-and-management coding accuracy). Published results from named systems including The Permanente Medical Group, Sutter Health, Mass General Brigham, Emory Healthcare, and St. Luke's Health System show documentation-burden and burnout reductions commonly in the 20 percent to 40 percent range, with early single-site evidence of roughly a 5.8 percent gain in weekly RVUs. This is an analysis of reported outcomes, not medical, legal, or financial advice, and AI scribes draft notes for clinician review rather than replacing clinical judgment.

Key Takeaways

The Permanente Medical Group reported that AI scribes saved its Northern California physicians an estimated 15,791 hours of documentation time over one year, equal to about 1,974 eight-hour workdays, across 7,260 physicians and roughly 2.58 million encounters, per its NEJM Catalyst analysis (CAT.25.0040).
A Mass General Brigham led study in JAMA Network Open found a 21.2 percent absolute reduction in burnout prevalence at 84 days at MGB, and Emory Healthcare saw a 30.7 percent absolute increase in documentation-related well-being at 60 days.
Sutter Health, using Abridge, published a JAMA Network Open study showing mean note time per appointment fell from 6.2 to 5.3 minutes (p<0.001), with burnout declining from 42.1 percent to 35.1 percent (p=0.12, not statistically significant).
A UCSF single-site study of nearly 1.2 million encounters found ambient AI scribe access was associated with a 5.8 percent increase in weekly RVUs (about 1.81 RVUs, roughly $3,044 per physician per year at 2025 Medicare rates) and a 2.8 percent increase in encounters, with no rise in claim denials.
KLAS validation studies of St. Luke's Health System (Ambience) and of FMOL Health, McLeod Health, and Rush (Suki) tie ambient AI to E/M coding accuracy gains and incremental per-provider revenue, with Suki sites averaging $1,223 per provider per month.
A JAMA Network Open editorial frames the full picture: subscriptions run roughly $200 to $600 per clinician per month, while burnout-driven turnover costs an estimated $500,000 to $1 million per departing physician.

The question every CMIO is now asking

A physician finishes her last visit of the day, and for the first time in years she does not open her laptop again at 9 p.m. That small change, repeated across thousands of clinicians, is what hospital executives are now trying to price. Ambient AI scribes, the tools that listen to a clinical visit and draft a note for the clinician to review, have become the most widely implemented generative AI application in health care. Their spread has changed the conversation in the C-suite. The early question was whether the technology worked at all. The current one, posed directly in a 2025 JAMA Network Open editorial by Shreya J. Shah and Patricia Garcia titled "Ambient AI Scribes, What Is the Return on Investment?", is whether the math holds up. Most health systems pay subscription fees of roughly $200 to $600 per clinician per month, a recurring cost that has to be justified as operating margins tighten.

The return is slippery because it shows up in three different places at once, and no single number captures it. Leaders at named systems have settled on three measurement buckets: clinician time and documentation burden, clinician well-being and retention, and direct financial productivity. Each has its own data, its own caveats, and its own published evidence from real health systems. Treat any one bucket in isolation and the case looks thin. Stack all three and the picture sharpens.

Bucket one: time and documentation burden

Time is the most consistent and best-documented return, and it is where the strongest evidence lives. The Permanente Medical Group (TPMG), the physician group for Kaiser Permanente in Northern California, ran one of the largest evaluations to date. In an analysis published in NEJM Catalyst (CAT.25.0040) and covered by the American Medical Association, TPMG tracked AI scribe use over a 63-week period from October 2023 through December 2024. A total of 7,260 Permanente physicians used the technology across 2,576,627 patient encounters. The headline figure is striking: an estimated 15,791 hours of documentation time saved for users compared with non-users, the equivalent of about 1,974 eight-hour workdays in a single year. Physicians using the scribes saw statistically significant reductions in note-taking time, time spent per appointment, and "pajama time," the after-hours work done outside 7 a.m. to 5:30 p.m. that clinicians and researchers repeatedly link to burnout.

The Permanente data surfaced something else that matters for ROI modeling. The top third of users accounted for 89 percent of all AI scribe activations, and heavy users saved more than double the time per note compared with lower-frequency users. That dose-response pattern, described by TPMG co-authors Kristine Lee, MD, and Vincent Liu, MD, tells operations leaders the return scales with adoption, not just deployment. Buying the licenses is not the win. Getting clinicians to reach for the tool at every visit is.

Other systems point the same way. Sutter Health, which uses Abridge, published a JAMA Network Open study (jamanetworkopen.2025.8614) led by Cheryl Stults, PhD, with chief medical information officer Veena Jones, MD, as senior author. Across 100 clinicians spanning specialties in Northern California, mean time in notes per appointment fell from 6.2 to 5.3 minutes (p<0.001), alongside less mental demand and less of the rushed feeling clinicians describe. The Cleveland Clinic, deploying Ambience, reported that clinicians cut the average time spent writing and reviewing notes by about 14 minutes per day. Cooper University Health Care in Camden, New Jersey, using Microsoft Dragon Copilot, saved clinicians about 4.15 minutes of documentation time per patient, which it said adds up to roughly an hour or more reclaimed daily. Intermountain Health in Salt Lake City reported a 27 percent reduction in time in notes per appointment among clinicians who used Dragon Copilot for 10 or more encounters. Different vendors, different cities, one direction.

Bucket two: burnout, well-being and retention

The second bucket is harder to put a dollar figure on, yet it may carry the largest financial weight. A Mass General Brigham led study, published in JAMA Network Open (jamanetworkopen.2025.28056) and drawing on surveys of more than 1,400 physicians and advanced practice providers at MGB and Atlanta's Emory Healthcare, found that ambient documentation use was associated with a 21.2 percent absolute reduction in burnout prevalence at 84 days at Mass General Brigham, while Emory Healthcare saw a 30.7 percent absolute increase in documentation-related well-being at 60 days. Rebecca Mishuris, MD, MPH, chief medical information officer at Mass General Brigham and a study co-senior author, said physicians told the team they had "their nights and weekends back," adding that "there is literally no other intervention in our field that impacts burnout to this extent." That is a remarkable sentence from someone whose job is to be skeptical of vendor promises.

The MGB authors were candid about the limits. The findings came from pilot users with modest survey response rates, about 22 percent at 84 days at MGB and 11 percent at 60 days at Emory, so they likely reflect the experience of the more enthusiastic adopters. The team also disclosed that Mass General Brigham holds an institutional investment in Abridge, exactly the kind of conflict a careful reader should weigh. The program scaled from an 18-physician proof-of-concept in July 2023 to more than 3,000 routine users by April 2025, and the system has said it will keep tracking whether burnout gains hold, plateau, or reverse over time. Honest measurement does not stop at the press release.

Sutter's study showed the same direction with more caution on significance. Burnout fell from 42.1 percent to 35.1 percent, but the change carried a p-value of 0.12, meaning it did not reach statistical significance in that 100-clinician sample. That candor is itself instructive. Well-being signals are real and consistent across systems, yet sample sizes and survey response rates often leave individual results short of statistical proof. The JAMA editorial connects the well-being story to the balance sheet, citing evidence that burnout doubles or triples the likelihood of physician turnover, with each departing physician costing an estimated $500,000 to $1 million in recruitment, onboarding, and lost productivity. Run that math and even modest retention gains can dwarf the subscription line.

Bucket three: financial productivity and coding

The newest and most scrutinized bucket is direct revenue. A University of California, San Francisco study by Holmgren and colleagues, published in JAMA Network Open (jamanetworkopen.2025.43524) and analyzing nearly 1.2 million ambulatory encounters across roughly 1,565 physicians over two years, found that ambient AI scribe access was associated with a 5.8 percent increase in weekly work RVUs, about 1.81 additional RVUs per week, and a 2.8 percent increase in encounters, with no increase in claim denials. The accompanying editorial translated that weekly RVU gain to roughly $3,000 to $3,044 per physician per year at 2025 Medicare rates. Small in isolation, but potentially enough to offset subscription fees once it is multiplied across hundreds or thousands of clinicians. The RVU gain also grew over time, consistent with a learning curve, which suggests the early numbers understate the ceiling.

Vendor-commissioned but independently conducted KLAS Research validations have pushed the financial story further by isolating coding accuracy. A KLAS ROI validation of St. Luke's Health System's enterprise deployment of Ambience Healthcare reported per-clinician annual revenue generated through enhanced Hierarchical Condition Category capture and greater evaluation-and-management coding accuracy, framing the gains as offsetting the cost of the technology and easing downstream burden on revenue cycle teams. A separate KLAS validation of Suki across FMOL Health, McLeod Health, and Rush University System for Health found a 35 percent to 65 percent reduction in after-hours documentation and an average incremental revenue of $1,223 per provider per month. The mechanism was a coding shift. At McLeod Health, level 3 codes for established patients dropped 18.2 percent while level 4 rose 7.3 percent and level 5 rose 5 percent, which McLeod CMIO Bryon Frost, MD, attributed to "more complete" notes. McLeod reported a net gain that started at $1,004 per provider per month after subscription cost and later climbed to $2,629. FMOL Health reported a 21 percent decrease in time in notes, a 43 percent decrease in notes left open more than seven days, and incremental revenue of $862 per user per month.

Coding-based ROI carries the heaviest asterisk of the three. The JAMA editorial cautions that higher RVUs could reflect more services, better documentation accuracy, or fuller capture of complexity, and warns that downcoding policies from payers such as Aetna and Cigna could swallow the coding gains. Systems measuring this bucket should separate honest documentation of work actually performed from anything that smells like a coding arms race, and they should watch denial rates as a guardrail. Revenue that arrives through a back door tends to leave the same way.

What good measurement looks like

The systems that report credibly share a few habits. They define a pre-period baseline and measure the same clinicians before and after, as TPMG, Sutter, and MGB did. They report response rates and statistical significance honestly instead of cherry-picking the flattering number. They separate adoption from deployment, since the Permanente dose-response shows the return concentrates among heavy users. And they disclose financial conflicts, as MGB did regarding its Abridge stake. The JAMA multisite picture is a useful reality check: across five academic medical centers using Ambience, DAX Copilot, and Abridge with Epic, AI scribes modestly cut total EHR time by 13.4 minutes and documentation time by 16.0 minutes, with 0.49 more visits per week. Real, measured gains, not a revolution. For CMIOs and clinical operations leaders, the lesson is that the return here is genuine and lives in several places at once, but it has to be measured deliberately, with the rigor a system would bring to any capital investment, and with clear eyes about pilot bias and shifting payer rules.

Health system	AI scribe / outcome reported	Source
The Permanente Medical Group (Kaiser Permanente, N. California)	Est. 15,791 documentation hours saved in one year (about 1,974 workdays); 7,260 physicians, ~2.58M encounters; significant cuts in note time and pajama time	NEJM Catalyst (CAT.25.0040); AMA; permanente.org
Mass General Brigham	21.2% absolute reduction in burnout prevalence at 84 days; >3,000 routine users by April 2025	JAMA Network Open 2025.28056; massgeneralbrigham.org
Emory Healthcare	30.7% absolute increase in documentation-related well-being at 60 days	JAMA Network Open 2025.28056; AHA Market Scan
Sutter Health (Abridge)	Note time per appointment fell 6.2 to 5.3 min (p<0.001); burnout 42.1% to 35.1% (p=0.12)	JAMA Network Open 2025.8614; vitals.sutterhealth.org
UCSF (multivendor)	5.8% increase in weekly RVUs (~1.81 RVUs, ~$3,044/physician/yr); 2.8% more encounters; no rise in claim denials	JAMA Network Open 2025.43524 (Holmgren et al)
St. Luke's Health System (Ambience)	Per-clinician annual revenue via enhanced HCC and E/M coding accuracy, offsetting tech cost	KLAS Research ROI Validation 2025; ambiencehealthcare.com
McLeod Health (Suki)	27% less documentation time; ~3.6 provider hours saved/month; net gain $1,004 then $2,629/provider/month	KLAS Suki ROI Validation 2026; Fierce Healthcare
FMOL Health (Suki)	21% less time in notes; 43% fewer notes open >7 days; +$862/provider/month	KLAS Suki ROI Validation 2026; Fierce Healthcare
Mass General Brigham (hybrid AI + virtual scribe)	41% reduction in after-hours work, 66% fewer delayed note closures, 12% wRVU increase at 50 days	J Gen Intern Med (10.1007/s11606-025-09979-5); massgeneralbrigham.org
Cleveland Clinic (Ambience)	~14 minutes/day less time writing and reviewing notes	AHA Market Scan; Cleveland Clinic Consult QD

Frequently Asked Questions

What is the single most reliable ROI metric for ambient AI scribes?

Time on documentation is the most consistently measured and reproducible return. Named systems including The Permanente Medical Group, Sutter Health, Cleveland Clinic, and Intermountain Health all reported reductions in note time or after-hours work, and the multisite JAMA study found total EHR time fell by 13.4 minutes and documentation time by 16.0 minutes across five academic centers. Financial and burnout metrics are real but more variable.

Do ambient AI scribes actually generate revenue, or just save time?

There is early evidence of direct revenue, but it is single-site or vendor-validated and should be read carefully. A UCSF study of nearly 1.2 million encounters found a 5.8 percent increase in weekly RVUs with no rise in claim denials, and KLAS validations of Ambience at St. Luke's and Suki at FMOL, McLeod, and Rush tied the tools to E/M coding accuracy and incremental per-provider revenue. The JAMA editorial cautions that payer downcoding policies could offset coding gains.

How much do ambient AI scribes cost?

According to the 2025 JAMA Network Open editorial by Shah and Garcia, most health systems pay subscription fees of roughly $200 to $600 per clinician per month. Leaders weigh that recurring cost against time savings, coding-driven revenue, and the avoided cost of physician turnover, which the same editorial estimates at $500,000 to $1 million per departing physician.

Do these tools replace clinical judgment or human review?

No. Ambient AI scribes draft a note from the visit conversation, and the clinician reviews and edits it before it enters the medical record. As the Permanente authors noted, the tools do not make diagnoses or treatment recommendations.