The Difference Between a Citation and a Verification

2026-01-02

Legal research has always been a verification problem masquerading as a search problem.

For decades, the technology available to lawyers optimized for search. Find the case. Locate the statute. Surface the relevant secondary source. The implicit assumption was that once you found the right authority, the hard work was done.

This assumption was always wrong, but it was a tolerable kind of wrong. When research required physically pulling reporters from shelves or carefully constructing Boolean queries, the friction itself imposed a kind of discipline. You had to read the case because you were already there, book in hand. You developed judgment about whether a holding actually supported your argument because developing that judgment was inseparable from the mechanical act of research.

The new generation of AI-powered legal research tools has removed that friction entirely. You can now get from question to citation in seconds. The problem is that the tools have optimized for the wrong endpoint. They deliver citations with unprecedented speed and confidence. What they do not deliver—what most of them do not even attempt—is verification that those citations do what a lawyer needs them to do.

This gap between citation and verification is where the actual risk lives.

The layers of verification

When a lawyer cites a case in a motion, they are making a series of implicit claims. Each claim is a layer of verification, and each layer is harder than the last.

Layer one: existence. Does this case exist? Is the citation formally correct—right volume, right reporter, right page number? This is the layer that catches outright hallucinations, the fabricated cases that made headlines when lawyers first started using ChatGPT for research. It is also the easiest layer to check. A database lookup can confirm existence in milliseconds.

Layer two: validity. Is this case still good law? Has it been overruled, abrogated, or limited by subsequent decisions? This is what Shepard’s and KeyCite have done for years. It is harder than existence—you need a citation graph and some way to interpret the significance of subsequent treatment—but it is a solved problem in the sense that reliable tools exist.

Layer three: authority. Is this case binding on the court where you are filing? A Ninth Circuit decision means nothing to a judge in the Second Circuit except as persuasive authority. A state appellate decision does not bind a federal court applying state law in the same way it binds a lower state court. This layer requires understanding jurisdictional hierarchy, the relationship between the court you are in and the court that decided the case you are citing.

Layer four: application. Does this case actually stand for the proposition you are citing it for?

This is where everything breaks down.

The application problem

The first three layers can be checked mechanically. Existence is a lookup. Validity is a graph traversal with some classification. Authority requires mapping jurisdictional relationships, which is complex but static.

Application is different. Application requires reading the case, understanding what the court actually held, distinguishing that holding from dicta, distinguishing the court’s reasoning from the arguments made by the parties, and then evaluating whether the proposition in your brief is a fair characterization of what the case establishes.

This is not a search problem. This is a reasoning problem.

Consider what it means for a case to “support” a proposition. In the simplest scenario, the court states a clear rule and your brief quotes it accurately. But legal reasoning rarely works this way. More often, you are extracting a principle from the court’s analysis, applying a holding to facts that differ from the original case, or synthesizing a rule from multiple authorities. Each of these moves requires judgment about what the case actually decided and how far that decision extends.

Now consider how AI systems typically handle this. A large language model prompted to find cases supporting a legal proposition will surface cases that discuss that proposition. Cases where the issue was raised. Cases where a party argued for the rule you want. Cases where the court considered and rejected the position you are advocating.

The model does not distinguish between these scenarios because, at a fundamental level, it is pattern-matching on text similarity. A case where the court extensively discusses why your proposed rule is wrong will often score higher than a case with a terse holding in your favor—because the rejecting case contains more text that matches your query.

This is not a bug in any individual system. It is a structural feature of how retrieval and generation interact in current AI architectures. And it means that the citations these systems produce with the most confidence are sometimes the citations most likely to undermine your argument if anyone reads them carefully.

What misapplication looks like

We have reviewed thousands of AI-assisted legal documents at this point. The taxonomy of application errors is remarkably consistent.

The party-versus-court confusion. The AI cites a case for a proposition that appears in the opinion, but the proposition is drawn from a party’s argument rather than the court’s analysis. The court may have rejected the argument entirely. This error is particularly common because judicial opinions often summarize party arguments in detail before disposing of them.

The dictum-versus-holding confusion. The AI cites a case for a statement the court made, but the statement was not necessary to the decision. It may have been a hypothetical, an observation about a related issue, or an explicit acknowledgment that the court was not deciding the question. Dicta can be persuasive, but citing it as if it were a holding misrepresents the authority.

The factual distinction problem. The AI cites a case with a holding that sounds applicable but turns out to depend on facts materially different from your situation. The court held that a particular contract was unenforceable, but the holding rested on specific provisions that do not exist in your contract. The citation is not wrong exactly—but it does not do the work your brief claims it does.

The superseded standard problem. The AI cites a case applying a legal standard that has since been modified by statute or later case law. The case has not been “overruled” in the way that would trigger a validity flag. It is still good law on its facts. But the framework it applies is no longer the controlling framework.

Each of these errors can survive every check short of actually reading the case and comparing it to the proposition in the brief. Each of them can make it into a filed motion. And each of them can be found by opposing counsel, or worse, by the judge.

Why this matters now

There is a temporal dimension to this problem that makes it urgent.

For most of legal history, the production of legal documents was slow enough that verification happened naturally. Associates read the cases because reading the cases was the research. Partners reviewed briefs because briefs took weeks to produce and the investment justified scrutiny.

AI-assisted drafting has compressed this timeline dramatically. A document that once took days now takes hours. The research that once required reading thirty cases now surfaces as a list of citations in seconds. The velocity has increased, but the verification capacity has not.

What we are seeing in practice is predictable: lawyers using AI tools to draft more documents faster, with less time spent on each one. The citations look right. The analysis sounds right. And the verificaiton step gets compressed or skipped because the output is so fluent that it feels verified.

This is the confidence trap. The better AI gets at producing plausible legal prose, the more dangerous it becomes when that prose contains subtle errors. A hallucinated case is obvious. A real case, correctly cited, that does not actually support your proposition? That takes careful reading to catch.

What verification requires

Genuine verification—the kind that actually protects lawyers and clients—requires checking each layer independently.

It requires confirming existence against authoritative databases rather than trusting that a plausible-sounding citation is real. It requires tracing subsequent treatment to identify not just direct overrulings but also the subtler forms of erosion: questioned holdings, distinguished facts, limited scope. It requires mapping jurisdictional authority to understand what weight a given case carries in a given court.

And it requires—this is the hard part—actually evaluating whether the case does what the brief says it does. Not whether the case mentions the relevant concept. Not whether the case contains sentences that sound supportive. Whether the holding, properly understood in context, supports the proposition for which it is cited.

This is the work that matters. Not finding citations. Verifying them.

The standard we should expect

The legal profession is beginning to recognize this problem. Bar associations are issuing guidance on AI use. Courts are imposing disclosure requirements. The question of what constitutes reasonable verification when using AI tools is becoming a professional responsibility issue.

We think the standard should be straightforward: if you cite a case, you should be able to represent that you have verified it actually supports your argument. Not that a tool found it. Not that it looked right. That someone—human or machine—actually checked.

This is not a higher standard than lawyers have always been held to. It is the same standard, applied to new tools. The difference is that the tools themselves now need to be built for verification rather than just search.

That is the work we are focused on. Not faster citation finding. Genuine verification at every layer.

The profession deserves tools that take accuracy as seriously as the discipline demands.