CURRENT MONTH (December 2024)
Stanford Study Reveals Hallucinations in Lexis & Westlaw Tools
Artificial intelligence (AI) in law practice continues to make headlines, with the technology morphing at an ever-accelerating pace. “Proponents of AI tout its potential to increase access to justice,” Chief Justice John Roberts noted in his 2023 Year-End Report on the Federal Judiciary, but he also noted the disturbing reports about “hallucinations” in the work product of generative artificial intelligence (GAI). We have all read such accounts. These so-called “hallucinations” typically take the form of false (or worse yet, nonexistent) legal authority to support a particular argument or proposition in a filing before a tribunal (as that term is defined in the Terminology section of the Model Rules of Professional Conduct). Many of these were cut and pasted from drafts authored by GAI platforms (ChatGPT and the like) and uncritically incorporated in such filings. Reported incidents of hapless (or brain-dead) lawyers facing sanctions for such misconduct are already too numerous to be counted on one’s fingers.
Until recently, however, it was supposed that legal research conducted using Lexis or Westlaw was reliable. In an era in which recourse to physical books that used to be the mainstays of the legal profession (e.g., Shepard’s, U.S. Reports, the West Reporter System) is rapidly diminishing, LexisNexis and Thomson Reuters represent the “gold standard” in conducting legal research. Nothing could be more fundamental to the practice of law than reliance on such longstanding and well-established research services. What is more, they have, in Lexis’s case, asserted that their GAI-produced citations are “hallucination free,” and, in Thomson Reuters’s case, that they avoid hallucinations “by relying on the trusted content within Westlaw and building in checks and balances that ensure our answers are grounded in good law.”
But what if these extravagant claims cannot be borne out and the gold standard has been transmuted by AI alchemy into a base metal? Recently, researchers at Stanford University’s Institute for Human-Centered Artificial Intelligence (HAI) conducted a study using a variety of research questions posed to the two aforementioned bespoke research services. The results are disconcerting: While the gold standard undoubtedly performs much more reliably than other GAI platforms, an alarmingly high percentage of results contained hallucinations—over 17 percent for Lexis and over 34 percent for Westlaw.
To approximate the most significant types of legal research, HAI investigated several categories: (1) general research questions (e.g., questions about doctrine, case holdings, or the bar exam); (2) jurisdiction or time-specific questions (e.g., questions about circuit splits and recent changes in the law); (3) false premise questions (questions that mimic a user having a mistaken understanding of the law); and (4) factual recall questions (questions about simple, objective facts that require no legal interpretation).
The HAI study distinguishes between two species of “hallucinations”: results that are “incorrect” and those that are “misgrounded.” The former describes the law incorrectly or makes a factual error; the latter describes the law correctly but cites a source that does not support the legal assertion being made.
Technological developments just within a single generation of practicing lawyers—emails, mobile phones, cloud computing, and virtual law offices, to name a few—have created a variety of ethical challenges. GAI, however, threatens to be (thus far, at least) the most disruptive technological development of all. Comment [8] to Model Rule 1.1 interprets the requirement of “competency” to mean that lawyers should “keep abreast of changes in the law and its practice, including the benefits and risks associated with relevant technology.” Likewise, more senior lawyers need to supervise junior lawyers and nonlawyer assistants (including those that are employees and those that are outsourced), pursuant to Model Rules 5.1 and 5.3, respectively, to ensure that they adhere to the requirements of the Rules of Professional Conduct, including that of technological competence.
Algorithms do not have ethical obligations. But you do. Trust, but verify!