Westlaw Deep Research and the cost of category errors

2026-01-30

When legal AI gets the law backwards

January 2026

In November 2025, Thomson Reuters demonstrated Westlaw Deep Research AI. They asked it a straightforward evidence question: can a laboratory director’s opinion be admitted as lay testimony in the Ninth Circuit?

Westlaw cited United States v. Elizabeth Holmes, the recent Theranos fraud appeal. It answered with confidence:

No. Laboratory directors generally cannot offer lay testimony. Their specialized knowledge makes them expert witnesses under Rule 702.

The answer was clear. The citation was real. The case was on point.

And the legal conclusion was completely backwards.

Why this matters

This isn’t a hallucination problem. Westlaw didn’t invent a case or fabricate a quote. It found the right case, pulled real language from the opinion, and fundamentally misunderstood what the court decided.

This is a category error. And it’s far more dangerous than a hallucinated citation.

A lawyer who catches a fake case will know immediately something is wrong. But a lawyer who receives a confidently stated legal rule, backed by a real citation to a relevant case, has every reason to trust it.

And if that rule is wrong, they’ll make strategic decisions based on bad law. They might concede arguments they could have won. They might pursue theories that don’t exist.

When we tested the same query using the legal research tools in the Citational Framework, it got both the case and the legal principle correct. The difference reveals a fundamental gap in how AI systems understand legal reasoning.

The case: United States v. Holmes

The question in Holmes turned on whether three former Theranos laboratory directors could testify as lay witnesses under Federal Rule of Evidence 701, or whether they needed to be disclosed as expert witnesses under Rule 702.

Elizabeth Holmes and Sunny Balwani argued that these witnesses - scientists with doctoral degrees, testifying about laboratory science - were obviously experts. Their testimony should have been excluded because the government failed to make the required Rule 702 disclosures.

The Ninth Circuit rejected this argument, but not in the way Westlaw’s flagship AI product suggests.

The court’s actual holding

The court explicitly refused to adopt a status-based rule:

“But the fact that a witness’s testimony pertains to scientific matters, or conveys opinions drawn from the witness’s own experiences with such matters, does not automatically render it expert testimony within the ambit of Rule 702.”

Instead, the court applied a functional test: What matters is not the witness’s credentials or the subject matter, but the basis of the opinion being offered.

If an opinion is based on specialized knowledge (statistical analysis, technical interpretation, scientific methodology), it’s expert testimony under Rule 702.
If an opinion is based on personal perception (what the witness saw, heard, or directly experienced), it’s lay testimony under Rule 701.

To illustrate this principle, the court used what we might call the “toaster test”:

“To borrow the district court’s analogy, if a certain model of a toaster consistently burned bread or short-circuited when run on regular settings, and those problems consistently manifested across multiple toasters of the same model, a lay person using the toaster could reasonably reach the conclusion that there is a problem with the design or manufacturing of the toaster.”

You don’t need an engineering degree to know the toast is burnt. You just need eyes.

What actually happened in Holmes

When the court applied this functional test to the Theranos lab directors, it found:

Some testimony was properly admitted as lay opinion. For example, Erika Cheung, a lab associate, testified that quality control tests repeatedly failed. She didn’t need specialized knowledge to observe that samples failed, she just needed to look at the readout. The court upheld this testimony as proper lay opinion.
Some testimony crossed into expert territory. Dr. Kingshuk Das performed a comprehensive statistical analysis of Theranos’s quality control data (the “Patient Impact Assessment”). Dr. Adam Rosendorff explained the physiological mechanism of hemolysis and why it interfered with test accuracy. These were technical interpretations requiring specialized knowledge, not mere observations.

The court found errors in admitting some expert testimony without proper disclosure, but held those errors harmless because the witnesses clearly had the necessary qualifications.

Crucially, the court never said laboratory directors “generally cannot” testify as lay witnesses. It said the opposite: the admissibility depends on what they’re saying, not who they are.

What Westlaw got wrong

Westlaw Deep Research read the same opinion and concluded:

“No, a laboratory director’s opinion generally cannot be admitted as lay testimony in the 9th Circuit. The 9th Circuit has established that laboratory directors typically possess specialized scientific and technical knowledge that places their opinions within the scope of expert testimony under Federal Rule of Evidence 702…”

This interpretation commits a fundamental error in legal reasoning: it mistakes a specific factual finding for a universal legal rule.

Yes, in this case, certain lab directors gave expert testimony. But the court explicitly rejected the idea that a person’s title or profession determines whether their testimony is expert or lay.

The real-world consequences

Imagine you’re defending a pharmaceutical company in a products liability case. You have a laboratory director who can testify: “I personally reviewed the batch records for the disputed lot. I saw that the temperature logs were blank for three consecutive days.”

That’s pure observation, exactly the kind of lay testimony Holmes permits.

But if you relied on Westlaw’s interpretation, you might believe this testimony is categorically inadmissible because the witness is a “laboratory director.” You might:

Fail to call a critical witness
Concede an evidentiary point you could have won
Spend unnecessary time and money qualifying the witness as an expert
Trigger additional discovery obligations under Rule 26

All because an AI tool confidently told you a legal prohibition exists when it doesn’t.

What Citational got right

When we ran the same query through our legal research and verification framework, the system correctly identified:

The functional test: The distinction depends on the basis of the opinion (perception vs. specialized knowledge), not the subject matter or the witness’s credentials.
The “no on-the-job exception” nuance: While the court said there’s no automatic exception for on-the-job observations, it doesn’t mean job-related observations are automatically expert testimony. The test remains functional.
The actual holding: In Holmes, certain lab directors gave expert testimony because they performed statistical analyses and explained technical mechanisms, not because they were lab directors.

The difference isn’t that Citational found different text. Both systems read the same opinion.

The difference is that Citational understood what the court decided versus what the court said happened in this particular case.

Why this error pattern matters

Large language models are particularly vulnerable to this type of category error. They’re trained to find patterns in text, and “court finds testimony inadmissible” is a strong textual pattern.

The model sees:

Laboratory director
Scientific testimony
Court finds testimony was expert
Conclusion: Laboratory directors = experts

But legal reasoning doesn’t work through pattern matching. It works through ratio decidendi — identifying the principle that drives the decision.

The Holmes court’s actual principle: The nature of the opinion determines its classification, not the nature of the person offering it.

Westlaw’s inferred principle: Laboratory directors are experts.

One is a legal rule you can apply to future cases. The other is a case-specific fact that tells you nothing about the next case.

The broader problem

This isn’t just about one wrong answer on one case. It reveals a systematic weakness in how AI tools approach legal research.

Legal research isn’t just about finding relevant cases. It’s about:

Distinguishing holdings from dicta
Separating case-specific facts from governing principles
Understanding when a court rejects a bright-line rule in favor of a functional test
Recognizing when an opinion explicitly warns against the very interpretation an AI might naturally gravitate toward

Traditional legal research databases built their reputations on accuracy because they didn’t interpret. They just provided the raw materials. Lawyers did the interpretation.

But AI legal research tools must interpret. They’re selling answers, not documents.

And when the interpretation is wrong - but confidently stated, properly cited, and superficially plausible - the danger multiplies.

What this means for the legal profession

We build our production on a foundational framework precisely because of this problem. The Citational Framework doesn’t just retrieve cases, it’s designed to understand legal logic:

What is the court’s actual holding versus what happened in this case?
What rules does the court adopt versus reject?
What distinctions does the court draw, and why?
What level of generality does the principle operate at?

These aren’t exotic questions. They’re the baseline requirements for competent legal research. And until AI systems can reliably answer them, the gap between confident and correct will remain dangerously wide.

Thomson Reuters is one of the most sophisticated legal technology companies in the world. If Westlaw Deep Research makes this kind of error on a high-profile, recently decided case in a core area of law, what errors is it making on more obscure questions?

And more importantly: how would you know?

The bottom line

The Holmes opinion runs 54 pages. Buried in those pages is a critical distinction between two types of knowledge: specialized expertise and ordinary observation. The Ninth Circuit devoted substantial analysis to explaining why this distinction matters and how courts should apply it.

Westlaw found the case. It found relevant passages. But it fundamentally misread what the court decided, transforming a functional test into a categorical prohibition.

Citational got it right — not because it used different data, but because it was built to understand legal reasoning, not just legal text. Because of advanced practices including adversarial methods and expert prompt engineering.

When the stakes are client outcomes, professional responsibility, and the basic integrity of legal argument, “close enough” isn’t good enough.

Legal accuracy demands more than finding cases. It demands understanding what they actually say.

Luke Cohen is the CEO of Citational, and former Head of AI R&D for Legalzoom - luke.cohen@citation.al