AI Regulation Tracker / For Insurance Professionals
NAIC AI Exam Pilot: What Insurance Regulators Now Demand
State insurance examiners are field-testing a standardized AI evaluation tool, and the shift that matters is from writing a policy to proving one on demand.
The NAIC AI Systems Evaluation Tool is a standardized framework that state insurance examiners use to review how insurers govern their AI systems during market conduct and financial exams. A 12-state multistate pilot ran from March 2, 2026 through September 2026 in California, Colorado, Connecticut, Florida, Iowa, Louisiana, Maryland, Pennsylvania, Rhode Island, Vermont, Virginia, and Wisconsin. The tool organizes review into four exhibits covering AI usage, governance risk assessment, high-risk systems, and data details, so insurers must be able to produce a written AI program, a risk-ranked model inventory, third-party audit rights, and testing records on request.
For three years the story in insurance AI was about writing rules. States adopted a model bulletin, insurers drafted governance policies, and the open question was whether anyone would ever check the homework. That question now has an answer. On March 2, 2026, the National Association of Insurance Commissioners started a multistate pilot of a standardized AI Systems Evaluation Tool, and twelve state insurance departments are using it to examine how carriers actually govern their algorithms. The policy era is ending. The evidence era has started.
If you run compliance, actuarial, or finance at a carrier, the practical change is blunt. It is no longer enough to have an AI governance program. You have to be able to hand an examiner a structured account of it, on their form, in their categories, on their timeline.
What the pilot actually is
The AI Systems Evaluation Tool is a common framework that gives examiners a repeatable way to review an insurer's AI governance during market conduct and financial exams. Instead of every state improvising its own document request, participating regulators work from the same template. The pilot began March 2, 2026 and runs through September 2026, and according to analysis from Fenwick, participating states will use the tool for market conduct exams and reviews, financial analysis, and financial examinations.
Twelve states are in: California, Colorado, Connecticut, Florida, Iowa, Louisiana, Maryland, Pennsylvania, Rhode Island, Vermont, Virginia, and Wisconsin. The NAIC has said that based on the pilot experience, the tool is anticipated to be taken up for adoption at the 2026 Fall National Meeting. Read that as a preview, not a fire drill. Whatever the tool asks for during the pilot is a strong signal of what a national standard will ask for after it.
One honest caveat up front: the tool is a pilot, and the model guidance behind it is a bulletin, not a statute. It does not carry its own new penalty. But it plugs into the exam authority regulators already hold, which is exactly why the softer legal label understates the operational bite.
The four exhibits: the evidence examiners now want
The tool organizes the review into four exhibits, and each one maps to a document set you either have or you do not.
- Exhibit A quantifies AI usage. This is the inventory question: where in the business are AI systems making or supporting regulated decisions, and how material are they. If you cannot list your models and where they touch underwriting, pricing, claims, or marketing, you fail the first exhibit before the substantive review starts.
- Exhibit B is the governance risk assessment framework. This is where your written program lives: how you assess and mitigate risk, who owns oversight, and what controls sit around each system.
- Exhibit C covers details on high-risk AI systems. Not every model gets the same scrutiny. The tool wants you to have already identified which systems are high risk and to document them in depth.
- Exhibit D covers AI data details, including data lineage and quality, with a field noted for reasonable accommodations or policy modifications.
Notice what these four exhibits assume. They assume you have already inventoried, already risk-ranked, and already documented. The exhibits are not a questionnaire you fill in from memory during the exam. They are a request to produce a file you were supposed to be keeping all along.
The Model Bulletin is the baseline everyone now shares
The tool does not appear out of nowhere. It operationalizes the NAIC Model Bulletin on the Use of Artificial Intelligence Systems by Insurers, which states have been adopting steadily. By 2025 the count had reached at least 24 states plus the District of Columbia, and it has kept climbing since, a spread the Quarles firm characterized as nearly half the country. The NAIC maintains a live implementation map, so treat any single number as a floor that moves.
What the bulletin expects is the substance the exam tool now checks for. Insurers are told to develop, implement, and maintain a written program for the responsible use of AI systems that make or support regulated insurance decisions. Regulators may request governance, risk management, and internal control documentation in an investigation or market conduct action. They may ask about your processes for acquiring and relying on third-party data and third-party AI systems, including contract terms that give you audit rights. And they expect records of validating, testing, and retesting your models. If that reads like the four exhibits in prose form, that is the point.
The distinction worth holding onto: the bulletin told you what to build. The evaluation tool is the moment someone shows up to see whether you built it.
Colorado is the sharp edge
Colorado is in the pilot, and Colorado is also where the evidence standard is hardest, because it is backed by binding law rather than guidance. Under SB21-169, passed July 6, 2021, insurers may not use external consumer data and information sources, known as ECDIS, or the algorithms and predictive models built on them, in a way that unfairly discriminates against protected classes.
Colorado turned that principle into a documentation duty. Its quantitative testing regulation for life insurance underwriting, which the Division of Insurance finalized effective November 14, 2023, set a regression-based framework requiring annual testing of whether ECDIS-driven underwriting or pricing produces unfairly discriminatory outcomes by race or ethnicity. Life insurers licensed in Colorado had to file a progress report by June 1, 2024 and an attestation of full compliance by December 1, 2024, and annually after that. The Colorado Division of Insurance publishes the underlying rules directly.
Put the two regimes side by side and the trajectory is clear. The NAIC tool asks you to describe and document your governance. Colorado asks you to test, attest, and correct, on the record, every year. If you operate in Colorado, the exam-readiness bar is not a policy binder. It is a dated evidence trail.
How to build an exam-ready AI evidence file
Here is a framework you can act on this quarter. Treat it as building the file an examiner will eventually request, not as a one-time cleanup.
1. Inventory every AI system that touches a regulated decision. Underwriting, rating, claims, fraud, and marketing all count. Name each model, its purpose, its vendor if any, and the business decision it affects. This is your Exhibit A. 2. Risk-rank the inventory. Flag the high-risk systems, the ones that materially affect consumer outcomes or pricing. Write down why each is or is not high risk. This feeds Exhibits B and C. 3. Pull your written AI use program into one governed document. If your policy is scattered across slide decks and emails, an examiner will read that as no program at all. State who owns oversight, how you assess risk, and what controls apply. 4. Document third-party dependencies with audit rights. For every vendor model or external data feed, record the contract terms, the audit rights you hold, and how you validate what the vendor gives you. Missing audit rights is a common and fixable gap. 5. Keep dated testing and validation records. Save your pre-deployment testing, your retesting cadence, and any bias or performance testing. In Colorado life underwriting, keep the annual quantitative testing and the attestation trail specifically. 6. Map data lineage and quality. For each high-risk system, be able to show where the data came from, how it is checked, and what accommodations or modifications you make. This is your Exhibit D.
A useful decision rule for the gray cases: for any AI system, ask whether you could hand an examiner a single folder that answers what it does, why it is or is not high risk, what data feeds it, and what testing proves it behaves. If the folder does not exist, the system is not exam ready, no matter how good the model is.
What to watch
Do not over-rotate, and do not under-rotate. The pilot is not binding on non-pilot states today, and the tool could change before any national adoption vote. But the direction is not ambiguous. Standardized examiner review of insurer AI is arriving, the bulletin behind it already covers most of the country, and states like Colorado show what the demanding version looks like. The insurers who will move fastest through an exam are the ones who treated the bulletin as a build order, not a memo. If you want to build the AI documentation and testing habits this now rewards, our [Leveraged CPA and Finance course](/leveraged-cpa-finance) is where finance and compliance professionals start, and the [two-minute quiz](/quiz) will point you to the right track. For the underlying rulebook the exam tool operationalizes, see our explainer on the [NAIC Model Bulletin for insurer AI governance](/ai-regulation-news/naic-ai-model-bulletin-insurer-governance-2026).
Frequently Asked Questions
Is the NAIC AI Systems Evaluation Tool a binding requirement I have to comply with today?
Not on its own. It is a pilot running March 2 through September 2026 in 12 states, and it operationalizes the NAIC Model Bulletin, which is guidance. The practical force comes from the exam authority regulators already hold, so in a pilot state an examiner can use it in a live market conduct or financial exam now.
My state is not one of the 12. Can I ignore this?
No. The bulletin behind the tool has already been adopted by at least 24 states plus the District of Columbia, and the NAIC anticipates considering the tool for broader adoption at the 2026 Fall National Meeting. Building to the tool's four exhibits now is the low-cost way to be ready when your state moves.
What is the single most common gap examiners will find?
Two show up repeatedly: no consolidated written AI program, and missing audit rights over third-party models and data. Insurers often have governance in practice but cannot produce it as one governed document, and they often rely on vendor AI without contractual audit rights. Both are fixable before an exam.
How is Colorado different from the general model bulletin?
Colorado's SB21-169 and its life insurance testing regulation are binding law, not guidance. They require annual quantitative testing of ECDIS-driven underwriting for unfairly discriminatory outcomes, plus a dated compliance attestation filed with the Division of Insurance. That is a higher, evidence-first bar than the bulletin alone.
Does this apply to AI systems we bought from a vendor rather than built?
Yes. The bulletin and the tool both reach third-party AI systems and third-party data. You are expected to document how you assess, rely on, and audit vendor models, which is why audit rights in your vendor contracts matter for exam readiness.
What should I do first if I have limited time?
Build the inventory. List every AI system that touches a regulated decision, note its purpose and vendor, and flag the high-risk ones. Exhibit A is the foundation the other three exhibits and the whole exam depend on, and it is the fastest thing to produce.
Sponsored Training
Browse the full AI Regulation News tracker
Informational analysis for working professionals, not legal advice. Confirm how any rule applies to your situation with qualified counsel.