AI Regulation Tracker  /  For Physicians

FDA Guidance: When Clinical AI Is a Regulated Medical Device

FDA has clarified the four-part test that separates safe-to-deploy clinical AI from a regulated medical device, and one criterion decides most cases.

FDA Guidance: When Clinical AI Is a Regulated Medical Device
The Leveraged Years AI Regulation Tracker

FDA's Clinical Decision Support Software guidance, revised in January 2026, sets a four-criteria test under Section 520(o)(1)(E) of the FD&C Act for when clinical software is non-device CDS rather than a regulated medical device. The software must not analyze medical images or signal patterns, must present recognized medical information, must direct recommendations to a healthcare professional rather than a patient, and must let that professional independently review the basis of each recommendation. FDA has confirmed that LLM-enabled tools can qualify only if they sufficiently enable that independent review, and it extends enforcement discretion to software offering a single clinically appropriate recommendation.

A physician who pastes a case into a general chatbot and one who buys a slick AI triage product are standing in very different regulatory positions, and until recently the line between them was fuzzy. In January 2026, FDA revised its Clinical Decision Support Software guidance and made that line easier to read. The document is nonbinding, but it tells you exactly how the agency decides whether a piece of clinical software is a regulated medical device or a tool you can deploy without any FDA submission at all (FDA, Clinical Decision Support Software guidance).

This matters for practice owners for a practical reason. If a tool is a device, someone has to clear it, validate it, and stand behind it under FDA's device rules. If it qualifies as non-device CDS, it sits outside that regime. Knowing which bucket a product falls into tells you how much scrutiny to apply before it touches a patient, and whether a vendor's marketing claims are quietly writing a regulatory check they cannot cash.

The statute behind the guidance

The whole framework traces back to the 21st Century Cures Act, which amended the Federal Food, Drug, and Cosmetic Act to carve five categories of software out of the medical device definition. Clinical decision support is one of them. The operative text lives at Section 520(o)(1)(E) of the FD&C Act, and FDA first finalized its interpretation in 2022 before revising it this January (MD+DI).

Here is the part that trips people up. Software is not automatically a device just because it gives medical advice, and it is not automatically safe just because a human reads the output. Congress drew a specific test, and FDA's guidance is the agency explaining how it reads each word of that test. A tool has to satisfy all four criteria to escape the device definition. Miss one, and it is a device.

The four-criteria test, in plain terms

FDA's guidance walks through each criterion in Section 520(o)(1)(E). Stripped to what a physician actually needs to check, the four are these:

Read those together and a pattern emerges. The first three are mostly about inputs and audience. The fourth is about trust, and it is where most modern AI tools live or die.

Criterion 4 is the one that decides your case

If you remember one thing, remember this: Criterion 4 is the pivot. FDA's own guidance says the software must let the clinician independently review the basis for a recommendation so the clinician does not rely primarily on it. That single requirement is what pushes a black box off the safe side of the line.

Large language models make this hard on purpose. They generate fluent output without transparent reasoning, and FDA has flagged exactly this problem. At a March 2026 CDRH town hall, as reported by MD+DI, the agency confirmed that LLM-enabled software can qualify as non-device CDS, but only if it sufficiently enables the professional to independently review the basis for any recommendation. The opacity of a black box model is precisely what makes Criterion 4 difficult to meet (MD+DI).

FDA also weighs two related factors under Criterion 4: how automated the software is, and how time-critical the clinical decision is. The agency's stated concern is automation bias, the human tendency to over-trust a machine's answer. A tool that fires a recommendation into a fast-moving decision, with no room for the physician to check the reasoning, is a tool the physician is likely to rely on primarily. That is the outcome Criterion 4 is built to prevent.

What FDA added in the January 2026 revision

The revision did more than restate old text. FDA extended enforcement discretion to a category that used to sit in a gray zone: software that produces only one clinically appropriate recommendation. The guidance states that when a single recommendation is clinically appropriate and the function otherwise meets the criteria, FDA intends to exercise enforcement discretion, meaning it does not intend to enforce device requirements against that function (FDA guidance).

Enforcement discretion is not the same as "not a device." It is FDA choosing not to police a low-risk category while retaining the right to step in if problems appear. For a practice owner, the practical effect is similar, but the label matters if you are the one building or buying the tool. This change is aimed at clinical-pathway and treatment-selection tools, the kind that narrow to one recommended order set.

FDA paired the same guidance revision with an updated take on low-risk general wellness products, the framework that governs consumer wearables. That is a separate lane, but it is a useful contrast: a wrist wearable that estimates blood pressure can stay a wellness product if its values are validated and it makes no medical claims, while a tool that reads as diagnostic crosses into device territory. The 2025 WHOOP matter, where FDA issued a warning letter stating the wearable displayed blood pressure indicators that qualified as an unapproved device (FDA Warning Letter to WHOOP, Inc., 07/14/2025), shows how fast a "wellness" claim can tip over the line.

A decision test you can run today

You do not need a regulatory lawyer to do a first-pass screen on a clinical AI tool. You need the four criteria and an honest read of your own workflow. Run any tool through this sequence before you rely on it:

1. Does it ingest images or continuous signals? If the tool interprets an ECG waveform, a scan, or a repeated sensor pattern, treat it as a likely device. Single entered values are fine. 2. Does it draw on recognized medical information? Criterion 2 asks whether the tool displays or analyzes medical information about the patient, or other medical information such as clinical studies and practice guidelines. A tool grounded in well-understood, accepted clinical sources and patient data fits Criterion 2. One with no basis in recognized medical information does not. (Whether you can see and check those inputs is a separate question, and it belongs to Criterion 4 below.) 3. Does it speak to the clinician, not the patient? Provider-facing is required. A patient-facing diagnostic app fails Criterion 3. 4. Can you independently check its reasoning? This is the decider. If the tool shows its sources, its inputs, and enough of its logic that you could reach the same conclusion on your own, it supports Criterion 4. If it hands you a confident answer you cannot verify, it does not, and you should treat it as a regulated-device candidate. 5. Is the decision time-critical or does the tool replace your judgment? Either factor weighs against Criterion 4 and toward device status.

If a tool clears all four criteria, it is likely non-device CDS you can deploy as genuine decision support. If it fails Criterion 4 in particular, the honest conclusion is that you are either looking at a regulated device or a tool you should not lean on for clinical calls. For physicians building documentation and reasoning workflows around general models, our reporting on why [general LLMs are outperforming purpose-built clinical AI tools](/ai-workflows/general-llms-beat-clinical-ai-physician-tools-2026) is a useful companion read, because the same transparency question drives both stories.

Where this leaves practice owners

The comfortable takeaway is that most of the AI a physician uses day to day, drafting notes, summarizing records, surfacing guideline references for the doctor to weigh, sits comfortably in non-device territory as long as the physician stays the decision-maker and can check the work. The uncomfortable takeaway is that the moment a tool starts functioning as the decision-maker, the regulatory picture changes, and no vendor disclaimer rewrites the four criteria.

The safest posture is also the best clinical practice: use AI as a fast, checkable first draft, and never let it be the last word. That is not just risk management. It is the exact behavior Criterion 4 is designed to protect. Physicians who want to build that habit into their documentation workflow can start with our [AI for physician notes course](/ai-for-physician-notes), and if you are not sure where your AI skills stand, the [two-minute quiz](/quiz) is a quick gut check. For the parallel story on federal transparency rules for clinical AI, see our coverage of the [ONC HTI-5 repeal and DSI model-card requirements](/ai-regulation-news/onc-hti-5-repeal-dsi-ai-transparency-model-card).

Frequently Asked Questions

Does FDA guidance being "nonbinding" mean I can ignore it?

No. The guidance is nonbinding in the sense that it does not carry the force of a regulation, but it states how FDA reads the statute and how it decides classification. If a tool fails the four criteria, the underlying device requirements in the FD&C Act still apply, so the guidance is your clearest signal of where a product actually stands.

Which criterion should I focus on first?

Criterion 4, the independent-review requirement. The first three criteria usually resolve quickly based on inputs and audience. Criterion 4 is where most AI tools, especially LLM-based ones, either qualify or fall out, because it turns on whether you can check the basis of a recommendation rather than relying on it blindly.

Can an LLM-based tool ever be non-device CDS?

Yes. As reported by MD+DI, FDA confirmed at a March 2026 CDRH town hall that LLM-enabled software can qualify, but only if it sufficiently lets the clinician independently review the basis for its recommendations. Features like source attribution, visible inputs, and structured reasoning summaries are what make that possible; a pure black box output does not.

What does "enforcement discretion" actually change for me?

It means FDA does not intend to enforce device requirements against a qualifying low-risk function, such as software offering a single clinically appropriate recommendation. It is not a permanent exemption. FDA keeps the right to act if safety issues emerge, so treat it as a lighter-touch category rather than a clean pass.

Why does time-critical use push a tool toward device status?

Because in a fast decision, the clinician has little room to independently review the software's reasoning and is more likely to rely on it primarily, which is the automation-bias problem Criterion 4 targets. FDA weighs both the level of automation and the time-critical nature of the decision when judging whether real independent review is possible.

If a tool only supports diagnosis and does not make the final call, is it automatically safe?

Not automatically. Supporting rather than replacing your judgment helps with Criterion 4, but the tool still has to clear the other three criteria, including not analyzing images or continuous signals and staying provider-facing. Run the full four-part test rather than stopping at the "it just supports me" claim.

Browse the full AI Regulation News tracker

Informational analysis for working professionals, not legal advice. Confirm how any rule applies to your situation with qualified counsel.