May 8, 2026 · 6 min read

AI scribes and the documentation-burden question.

TL;DR [show]

AI ambient scribes (Abridge, Nabla, Suki, Augmedix) save real documentation time. The interesting question is which of three buckets the saved minutes actually go into (patient attention, capacity, or clinician recovery), and that allocation tracks the financial incentives of the deploying organization, not any property of the technology.

AI scribes and the documentation-burden question — by Thomas Jankowski, aided by AI — *Where the saved minutes go*— TJ x AI

The story everyone wants to tell about AI ambient scribes is the saved-time story. A clinician spends sixty to ninety minutes per day on documentation. The ambient-scribe vendor records the visit, generates a draft note, hands it to the clinician for review, and the clinician spends some smaller amount of time editing rather than composing. Multiply by the visit count, multiply by the clinician count, and the marketing deck has a number that looks like a category- defining productivity gain. The deployments at The Permanente Medical Group, the early reads from UNC, Emory, KUMC, and Mayo, the pilots at Stanford and University of Pennsylvania, all show measurable time reduction. The Abridge, Nabla, Suki, and Augmedix logos start appearing on health-system slides. The category is real. The numbers are real.

The interesting question is the one nobody is asking yet, which is what the saved minutes get spent on. There are only three places they can go. They can go to the patient. Same visit count, more attention per visit, better connection, better adherence, eventually better outcomes. They can go to capacity. Same minutes-per-visit, more visits per day, more patients seen by the same provider. Or they can go to the clinician’s recovery: same visit count, same attention-per-visit, but the clinician goes home at six instead of eight, finishes the inbox before bed instead of in bed, and is less burned out next month. These three are not the same outcome. They have very different downstream effects on patient care, on health-system economics, and on the long-run sustainability of the clinical workforce. The category will succeed or fail not on whether it saves minutes, but on which of those three buckets the saved minutes actually land in.

The read that survives is that the answer will vary by deployment, and the variation will track the financial incentives of the deploying entity rather than any property of the technology. A fee-for-service organization with a production model and slot-fill pressure on the schedule will, all else equal, push the saved minutes toward capacity. The visit count will inch up. The minutes-per-visit will stay flat. The patient experience will be roughly unchanged because the per-visit attention budget is roughly unchanged, only now the EHR is no longer eating the gaps between visits. The clinician’s after-hours load will drop somewhat, because some of the documentation has moved in-visit, but the gain will be partial. The economic proposition for the health system is positive: more revenue per provider FTE, and a small reduction in the burnout-driven attrition rate that has been a balance-sheet item since 2021.

A capitated organization with a fixed panel will push the saved minutes toward recovery and patient attention. The visit count cannot grow without growing the panel, which is a separate and slower decision. The economic value of the saved minute is recovered through reduced attrition and reduced clinical-error rate, not through additional billed visits. The clinician goes home earlier. The next-day clinician is less degraded. The compounding effect over a year is real, and the health-system math values it differently than the fee-for-service math does. Permanente is a useful early case here. The early TPMG read on the ambient-scribe rollout (October 2023 onward across Northern California) reports patient-attention and clinician- wellbeing gains that look real, against a backdrop where the organization does not have the same incentive to convert those gains into throughput as a fee-for-service shop would. The salary structure is not pegged to RVUs, the panel size is the constraint, and the productivity gain falls to the system in a different shape.

The third bucket, the clinician-recovery bucket, is the one the discourse is most likely to misframe. There is a version of this where saved-minutes-as-recovery looks like soft money: nothing is billed differently, no new patients are seen, the system’s P&L line item that is moving is the recruiting-and-retention spend on the medical staff. That movement is real, and the dollars are real, but they are diffuse and delayed and the CFO will discount them next to a hard volume number. The temptation under production pressure will be to recapture the recovery as capacity, and the way that happens in practice is by adding slots to the daily schedule once the documentation overhead is visibly down. This is the failure mode to watch. The saved-minutes-as-capacity outcome is fine on paper and measurably worse for clinician retention than the saved-minutes-as-recovery outcome, because the per-visit attention budget remains compressed and the per-day cumulative load is roughly the same. The clinician notices. The vendor does not.

Two operator-grade observations follow. First, the right unit of analysis is not the time saved but the time reallocated, and any health-system rollout that does not explicitly designate where the saved minutes are supposed to go is making the choice by default, and the default choice in a fee-for-service environment is capacity. The rollouts that are going to look like long-run wins on the clinician side are the ones that pre-commit a portion of the saved minutes to a non-throughput outcome, either by capping daily visit count, or by structurally protecting recovery-time slots, or by routing the saved minutes explicitly into longer-visit-duration patient slots for chronic-care work that historically does not get the attention it warrants. None of those moves are technical. They are operational decisions made by health-system leadership, and they will determine the durability of the scribe deployment more than any model improvement will.

Second, the right number to watch over the next eighteen months is not the time-saved figure that the vendors will report. The vendors have every incentive to report a large time-saved figure, and the figure is not actually the outcome metric. The outcome metrics are clinician retention, patient-reported attention or connection scores, same-day note closure rate, after-hours EHR time, and the specific bucket assignment of the saved minutes. Two early signals worth tracking: the gap between documentation-time-saved (which the vendor can measure) and after-hours-EHR-time-saved (which the EHR system can measure independently); and the change in same-day-note-closure rate, which is the single best leading indicator that the saved minutes are going into recovery rather than into capacity. If after-hours EHR time falls and same-day closure rises, the saved minutes are going into recovery and patient attention. If after-hours EHR time falls but the visit count rises and same-day closure stays flat, the saved minutes are going into capacity.

The vendor landscape sorts naturally against this frame. Abridge ($150M Series C in early 2024, deployments across UNC, Emory, KUMC, Mayo) is the most enterprise-mature of the standalone vendors and is being adopted into environments that span the production-pressure spectrum. Nabla published the Permanente results in NEJM and is positioned in environments where the deploying organization has more discretion over how the saved minutes are allocated. Suki is selling deeper into integrated delivery networks. Augmedix has the longest operating history (the company was founded in 2012, originally as remote-human scribes, and has progressively layered AI underneath the product) and brings the operational discipline of having run scribe workflows at scale before generative AI was a line of business. The EHR vendors, Epic in particular, are entering the category, and the entry will compress pricing and shrink the standalone-vendor market within two years for the deployments where the EHR-native option is good enough. The standalone vendors that survive will do so by being meaningfully better at one of the three saved- minute buckets, not by being marginally better at note generation.

A short note on the shape of the technology. Ambient scribes are a routing-and-rules problem with a generative component, which is the same observation that holds for the pharmacy-operations category. The generative model produces the draft note. The hard part of the product is not the draft. The hard part is the integration with the EHR (note placement, template alignment, billing-code attribution, problem-list updates), the speaker-attribution and chart-context layer that makes the note clinically usable, and the review-and-edit workflow that respects the clinician’s time. The vendors who are winning the category in 2024 are the ones who treated the LLM as a component and built the surrounding system to be defensible at the integration layer. The vendors who built around a fine-tuned model and a thin integration story are not in the deployments.

The piece I keep coming back to is that the saved-minutes question is, at its core, an organizational-design question. The technology is the easy part. It works. It will keep working better. The harder question, the one that determines whether the category produces durable clinical value or quietly becomes another throughput- compression tool, is which health systems use the technology to invest in their clinical workforce and patient relationships, and which use it to inevitably recover the minutes as billable visits. Both populations of health system exist. Both will deploy ambient scribes. Their five-year outcomes will not look the same, and the gap between them will be substantially wider than the gap between vendors. The people who will tell us first are the clinicians on the ground, and the signal will show up in retention numbers before it shows up anywhere else, and well before any model-quality benchmark notices the difference.

—TJ