Most writing about the problems with AI lands in one of two camps. One side says the machines are about to take every job and end the world. The other says anyone worried is a luddite who will be left behind. Neither is much help on a Tuesday when you have a client deliverable due and a tool on your screen that is right most of the time and confidently wrong some of the time.
This briefing is for the people stuck in the middle, doing real work. The honest truth is that AI has a specific, knowable set of limitations, and none of them are mysterious. They are predictable enough that you can build habits around each one. That is the whole game for a working professional: not deciding whether AI is good or bad in the abstract, but knowing exactly where it fails and having a routine that catches the failure before it reaches a client, a court, or a regulator.
We will go through the real problems one at a time. For each, two things: why it happens, in plain terms, and the concrete habit professionals use to manage it. The examples lean toward regulated work, finance and accounting especially, because that is where getting this wrong stops being embarrassing and starts being expensive.
Problem 1: Confident wrongness, also known as hallucination
The single most important thing to understand about a large language model is that it does not know things the way a database knows things. It predicts plausible text. Ask it for a fact and it will produce something that looks exactly like a fact, in the right shape, with the right tone, whether or not it is true. The technical term is hallucination, but that word undersells how convincing the output is. The model is not confused or hedging. It is confidently, fluently wrong, and it sounds identical to when it is right.
This is not a rare glitch you can engineer away. It is a property of how the technology works. The models have gotten much better, but "much better" is not "never." Even a tool that is wrong only a small fraction of the time, while sounding certain every single time, is more dangerous than a tool that is wrong often and obviously, because the rare confident error is the one that slips through unchecked.
The professions are already full of cautionary tales. Since 2023, courts across the United States have repeatedly sanctioned lawyers who filed briefs containing case citations that an AI tool simply invented, complete with realistic case names, plausible volume numbers, and quotations that did not exist. In at least one reported matter, a judicial officer acknowledged that fabricated citations came close to slipping into a court order before they were caught. Consequences have included monetary sanctions, mandatory additional legal education, referrals to disciplinary authorities, and the professional embarrassment of a published opinion with your name on it. The same failure mode applies to a finance professional who asks for a summary of an accounting standard, or a tax treatment, or a figure from a filing. The answer will look authoritative either way.
The habit: verify anything checkable, and treat AI output as a draft, never a source. This is the foundational discipline, and it is simpler than it sounds. Any specific, falsifiable claim the model produces, a citation, a number, a regulation, a quote, a date, gets checked against a real source before it leaves your desk. You are not verifying the AI's vibe. You are verifying the load-bearing facts. The mental reframe that makes this stick is to stop treating the model as a knowledgeable colleague and start treating it as a fast, fluent intern who is sometimes making things up to please you. You would never file an intern's first draft without reading it. Same rule.
In finance work this is concrete. If the model cites a tax code section, you open the primary source and read it. If it hands you a ratio or a reconciliation, you recompute it in your own spreadsheet rather than trusting the number it typed. If it summarizes a standard, you confirm the summary against the standard itself. The practical version of the rule is to use AI where verification is cheap and avoid relying on it where verification is expensive or impossible. Asking it to restructure a document you wrote is low risk, because you can see whether it mangled your meaning. Asking it to recall a specific tax figure from memory is high risk, because the cost of a wrong number you did not catch is enormous. Push the work toward tasks where you remain the source of truth.

Problem 2: Confidentiality and data risk
The second problem has nothing to do with whether the AI is correct. It is about where your information goes when you type it in. Many consumer AI tools retain what you submit, and some use it to improve future models. For a person summarizing a recipe, that is irrelevant. For a professional who holds other people's secrets, it is a serious exposure.
When an accountant pastes a client's bank statements into a free chatbot to summarize them faster, or a lawyer drops a privileged memo in to "clean up the language," the convenience is real and so is the breach. The data has now left the building, and you cannot pull it back. Your duty of confidentiality does not care that the tool was convenient. In regulated work, that duty attaches to the underlying information, not to the software you happened to use, which means the tool is never a loophole around the obligation.
The habit: a hard line on what data goes into which tool. Mature professionals decide this once and never improvise it case by case. Confidential and regulated information, client financials, tax data, anything personally identifiable, privileged material, protected health information, goes only into a tool that has been vetted and contractually approved for that purpose, in writing. Everything else can flow more freely. The difference between a free consumer tier and a paid business or enterprise tier is often exactly this contractual data protection, so knowing which tier and which retention setting each approved tool uses is part of the job now.
This is where the practitioner's discipline meets the firm's policy. The individual habit, never feed regulated data to an unvetted tool, only scales if the firm has decided which tools are approved and trained everyone on the line. That firm-level question, the policy, the approved-tool list, the training and the audit trail, is a different job from the one this briefing is about. We cover it in depth in our briefing on AI governance training, which is the firm-and-policy counterpart to this practitioner piece: that one is for whoever sets the rules, this one is for the individual at the keyboard living inside them. If you run or help run a firm, read that one next.
Problem 3: Over-reliance and skill erosion
This problem is quieter than the others and, over a career, possibly more damaging. When a tool can produce a competent draft of almost anything, the temptation is to stop doing the underlying thinking yourself. At first that feels like pure productivity. Over months and years, it can hollow out the very judgment that made you valuable.
The risk is sharpest for people early in their careers, who are building the instincts that let a senior professional glance at a number and feel that it is wrong. Those instincts come from doing the work the slow way enough times that the patterns become intuition. If a junior analyst lets the model do the reasoning from day one, the reasoning muscle never develops, and they are left able to operate the tool but unable to catch it when it fails. That is a bad trade, because catching the failure is the entire value a professional adds over the raw model.
The habit: do the thinking first, then use AI to check, extend, or accelerate, not to replace. The sequence matters. Form your own view of the answer, then bring in the tool to pressure-test it, find what you missed, or speed up the mechanical parts. This keeps your judgment in the loop and sharp, and it has a useful side effect: when you already have a view, you are far more likely to notice when the model's answer is off, because it will clash with what you expected. People who outsource the thinking entirely lose exactly that early-warning system.
There is a deeper skill hiding inside this habit. Using AI well is not about memorizing clever prompts. It is about knowing what to delegate, what to keep, how to frame a problem, and how to judge an answer. That higher-order skill, deciding and verifying rather than just asking, is what we call judgment over prompting, and it is the difference between professionals who get leverage from these tools and those who get burned. Our briefing on judgment over prompt engineering goes deep on that distinction if you want to build the skill deliberately.
Problem 4: Bias and uneven reliability
Problem 1 was about facts the model gets wrong. This problem is about judgments the model gets skewed. AI models learn from large bodies of human-produced text and data, and they absorb the assumptions and the historical patterns baked into that material. When you ask a model not for a fact but for a judgment, who looks risky, which items deserve scrutiny, what a reasonable assumption is, it can quietly carry forward a bias that was never yours and that you would reject if it were stated out loud.
For finance and accounting work this is concrete, not abstract. A model trained on historical lending or credit data can reproduce the patterns that disadvantaged certain borrowers in that history. A model suggesting which accounts to pull for an audit sample may lean toward the familiar and underweight the newer or non-standard accounts that are exactly where the surprises hide. A model summarizing a contested or recent regulatory question may be working from sparse, one-sided material while sounding just as certain as it does about basic arithmetic. The danger is not that the model is loudly prejudiced. It is that the skew is quiet, plausible, and easy to adopt without noticing.
The habit: treat every judgment call as a candidate answer, not a conclusion, and run it past the test you would apply to a junior colleague. Would I accept this reasoning if a first-year analyst handed it to me? Where could this recommendation be skewed by what was common in the past rather than what is right now? Does it carry an assumption I would not endorse if it were stated plainly? If the logic is opaque or the answer leans on patterns you would not defend, do not use it. The model has earned exactly the trust of a fast, useful junior colleague whose work you still read with a skeptical eye.
Problem 5: The accountability gap
Here is a problem the technology cannot solve, because it is not a technical problem. When AI-assisted work goes wrong, the AI is not accountable. You are. The firm is. The professional who signed, filed, or sent it is. "The AI told me" has been tried as a defense and it does not work, with clients, with regulators, or with courts. Responsibility does not delegate to a tool.
This is precisely why the regulated professions cannot treat AI as a black box that produces finished work. A number on a financial statement, a position in a tax return, a citation in a brief, each carries a human's name and a human's liability. The model can draft it. It cannot own it. And the gap between what the model produces and what a human is willing to put their name on is exactly the space where professional judgment lives.
The habit: a named human owns every output, and ownership means having actually checked it. Before anything leaves the building, a specific person has reviewed it to the standard they would apply to their own unaided work, and is prepared to stand behind it. This sounds obvious until you watch how fast it erodes when a tool makes output cheap and plentiful. The discipline is to keep review attached to accountability even when, especially when, the volume of AI-generated material makes skimming tempting. The output got faster. The standard did not move.
Problem 6: The verification cost is real
The last problem is the one the productivity headlines ignore. Verifying AI output takes time and skill. For some tasks, checking the machine's work carefully takes nearly as long as doing it yourself would have. When that is true, the AI has not actually saved you anything once you account for the review, and pretending otherwise is how errors slip through, because the time pressure to skip the check is always there.
A mature professional treats verification cost as a real input to the decision of whether to use AI at all. Where the work is easy to verify, restructuring text you wrote, generating options you will judge, handling a first pass on something mechanical, the leverage is enormous and the verification is cheap. Where the work is hard to verify, a specific factual claim in a domain where you would have to research from scratch to check it, the math can flip, and the tool can quietly cost you more than it saves.
The habit: match the task to its verification cost. Reach for AI hardest where you can confirm the output quickly and where you remain the expert who can spot the error: rewriting your own draft for clarity, structuring a memo on a topic you know cold, generating a first-pass checklist from rules you can confirm at a glance, drafting options you will judge. Be far more cautious in the opposite zone: a novel tax position you do not already understand, detailed calculations you cannot independently recompute, citations you will not actually check. Where verification is slow, where you lack the expertise to catch a subtle mistake, or where a missed error is catastrophic, the tool can quietly cost you more than it saves. This is the single judgment that separates professionals who get durable leverage from these tools from those who get an occasional disaster. It is learnable, and it is mostly about being honest with yourself about how you would actually catch a mistake.
This is also worth distinguishing from a related but narrower worry. Whether a given piece of text was written by AI, and how the detection tools that claim to tell you that actually work, is a separate question we handle in our briefing on how AI detectors work. That piece is about the detection sub-problem specifically. This briefing is about the broader set of limitations a working professional has to manage, of which "was this AI-written" is only one small corner.
Putting the habits to work in real professional practice
None of these six habits is hard to understand. The difficulty is doing them consistently, under deadline pressure, with a tool that keeps making it tempting to skip the check. That is a skill, and like any skill it is best built on your actual work rather than on generic principles. A finance team learns to manage AI's limits by drilling the verification habit on real client financials and the data line on the kind of confidential information they handle every day, not by sitting through an abstract ethics module.

That is the whole design idea behind The Leveraged CPA and Finance program. It teaches accounting and finance professionals to get real leverage from AI on the work they actually do, with the verification discipline, the data-handling rules, and the judgment habits built into the workflow rather than bolted on as warnings. The point is not to fear the tool or to trust it blindly, but to use it the grown-up way: fast where it is safe, skeptical where it is not, and always with a human who owns the result. Our full course catalog is organized the same way, by the work you actually do, so the habits transfer directly into your day.
Not sure where your own biggest exposure is? Our two-minute AI readiness quiz points you to the right starting place based on how you are using these tools today.