How to Protect Your Law Firm’s Data in the Era of GenAI

12 Min Read By: Dr. Ilia Kolochenko

Today, technical, legal, and business risks associated with generative AI (GenAI) are widely publicized to most legal professionals. AI hallucinations, privacy issues, infringement of third-party intellectual property rights, possible antitrust issues, leak of confidential information, poisoning of training datasets, and theft of proprietary technology are just a few to name. However, the AI governance strategies of many US law firms either are still in a nascent stage of conceptualization or early implementation, or don’t exist at all. This article discusses key steps lawyers and law firms should consider to preserve confidentiality of client data as this always-important goal faces further challenges in the turbulent era of automation and GenAI.

While cybersecurity is increasingly a top priority of large and medium-sized law firms, the rise of AI has increased the incentives to obtain firms’ data, frequently by improper or illicit means. For instance, creating competitive and trustworthy LegalTech AI solutions requires high-quality training data—including sensitive, privileged, and confidential legal documents. Now, not only sophisticated cybercrime actors and malicious insiders, but also numerous technology startups and large tech vendors actively seek to get access to law firms’ data, albeit for different purposes.

In this context, the (oftentimes clandestine or stealthy) integration of GenAI into numerous platforms, tools, and technologies used by lawyers on a daily basis, and the potential for data exposure this poses, deserves special attention. Lawyers need to attend to the risks that arise when implementing new technology, and risks related to unauthorized information disclosure to legitimate third parties are widely unidentified or underestimated.

This year, July 29 was marked with the release of a long-awaited and much-needed ethics opinion from the American Bar Association on generative artificial intelligence tools, Formal Opinion 512. Section B of the Opinion’s discussion is dedicated to the duty of confidentiality, elaborating on the protection of prospective, current, and former clients’ data from unauthorized use and access both within and outside of a law firm. Several state and local bars have also released their own guidelines on use of GenAI in legal practice, many of which, like those of the California State Bar and the New York City Bar Association, similarly include significant discussion of confidentiality issues.

It is important to note that data risks are not limited to GenAI: Other types of architectures and AI models usually share the same or similar risks. High-quality training data is the precious fuel of any contemporary AI technology; without it, even the most powerful and wealthy AI tech giants will be technically unable to innovate. The legal industry is as affected by this as any other, with the mushrooming of AI-enabled legal software for both lawyers and nonlawyers, ranging from e-discovery triage tools and contract review assistants to more complex systems that may predict the outcome of a trial based on underlying facts and relevant case law. As a result, demand for legal data—including memos, briefs, lawsuits, motions, depositions, contracts and settlements—is surging amid modest supply.

Despite these challenges, a proper implementation of well-established and time-tested data protection best practices will address many AI-related risks and threats.

First, lawyers should bear in mind that even if their law firm does not use specific GenAI tools or solutions, their data—including work product and privileged and confidential client data—may be stealthily utilized by third parties for unauthorized or unexpected purposes, such as commercial large language model (LLM) training. (In simple terms, an LLM is the “brain” of GenAI technology, trained on huge amounts of human-created and other data.) Some vendors, desperate for high-quality AI training data, creatively update their terms of service by playing with semantics to make their terms as unsuspiciously broad or ambiguous as possible to eventually extrapolate the permitted use of customer data for training of proprietary LLM models. Less scrupulous vendors simply update their terms with immediate or even retroactive effect to allow use of customer data for AI training, and then send an unobtrusive notice to customers, for instance, concealed inside a monthly newsletter to distract attention from the perilous change.

Therefore, in the era of AI, it is indispensable to have a comprehensive and up-to-date inventory of technology vendors with access to law firm data and their current terms of service. Importantly, this list of vendors should also encompass the numerous online and software-as-a-service (SaaS) solutions the firm uses, spanning from Google Workspace, which is often favored by solo practitioners and small firms, to complex customer relationship management (CRM) or enterprise resource planning (ERP) platforms tailored for Big Law firms. Even tools like Google Translate or online grammar correction software, which can seem safe and innocent at first glance, may pose a hidden risk if used by law firm employees or external consultants, such as expert witnesses, to process legal or judicial documents, as their content may end up in a place where it should never be. To prevent such incidents, law firms should consider implementing and enforcing a written policy to address permitted use of their data, expressly prohibiting all tools and services that are not present in the list of authorized solutions.

Firm-wide data minimization, or limiting collection and retention of data to the minimum needed for a specific purpose, is arguably even more crucial to reduce a wide spectrum of cybersecurity and privacy risks, including those related to GenAI. If data does not exist, it simply cannot be misappropriated even in the case of the most sophisticated data breach or flagrant human error. Moreover, data minimization is the cornerstone of many emerging privacy laws and regulations. Data minimization is, however, virtually impossible without having a clear understanding of data inventory and data flows in the first place. Thus, the very first step is to document what data a law firm stores and processes, for what purposes, and where, and how that information can be captured in a corporate data management program. Once a firm’s data is mapped and underlying data flows are identified, data minimization can be thoroughly and thoughtfully implemented.

Data minimization strategies help ensure that all data necessary for business, as well as documents that must be preserved as a matter of law, will be duly safeguarded and readily available, while also enabling and facilitating secure deletion of obsolete or redundant data. Data minimization also drives operational costs down by optimizing data storage, processing, and backup bills. Additionally, any data that must be preserved but is not required in daily operations may be securely sent to so-called cold storage, from where it can later be retrieved if necessary. Cold storage facilities are remarkably cost-efficient and are usually beyond the reach of malicious insiders, disgruntled employees, or external cybercriminals. In sum, data minimization is a cybersecurity principle that has been known for decades, and it continues to be a potent tool to reliably address risks when interacting with emerging technology such as AI.

Another business-critical best practice to maintain data privacy is to establish separate data protection agreements with all external parties that may have access to a firm’s data, explicitly prohibiting any unauthorized use of the data. The agreement should have a conspicuous clause that in case of any conflict with clickwrap agreements or similar terms of service from a vendor, the agreement shall always prevail. Notably, data protection agreements are needed not only with those vendors that by design ingest a law firm’s data for storage or processing but with all vendors that may occasionally or tangentially have access to the data or any part of it. For instance, cybersecurity vendors that scan a firm’s laptops, servers, or emails for malware may legitimately send suspicious files to their cloud for further analysis unbeknownst to the law firm. Solo practitioners and boutique law firms, which typically cannot afford to invest many resources in a comprehensive vendor management program, may at least minimize their number of third-party data processing vendors and carefully review the terms of service of those that remain, as well as minimize or anonymize any data that they submit for external processing. Paradigmatically, legal professionals should remember that lack of time or budget is virtually never a valid excuse for breach of ethical or fiduciary duties related to use of AI or any other technologies.

Use of public cloud providers, such as Amazon Web Services (AWS) or Microsoft Azure, deserves a dedicated mention within the context of law firm cybersecurity. According to Gartner, through 2025, 99 percent of cloud security incidents will be the fault of the customer, caused by human error or misconfiguration of cloud services. Unsurprisingly, cybercriminals and unscrupulous data brokers vigorously go after misconfigured cloud storage to access exposed data without any hacking and sometimes, debatably, without even breaking the law. Such carelessly exposed data may be exploited for all imaginable and unimaginable nefarious purposes, including LLM training by unprincipled tech vendors or even sovereign states amid the global race for AI supremacy. To avoid falling victim to a cloud data breach, law firms should maintain a comprehensive inventory of their cloud-stored data and cloud resources and have those resources regularly tested by specialized cloud security providers for possible misconfigurations, vulnerabilities, and weaknesses.

Notably, all of the abovementioned challenges to data confidentiality also silently reside at law firms’ trusted third parties that have legitimate access to firms’ data under a proper data protection agreement. To illustrate this convoluted problem, consider a law firm with an affiliated law firm that uses a cloud backup service provider. Despite a properly implemented data protection agreement between the two law firms, the cloud provider may share, sell, or otherwise exploit the backup data unbeknownst to both law firms. Worse, this practice is not necessarily illegal: For example, the affiliated law firm could simply overlook a tiny clause in its contract with the vendor authorizing use of backup data for LLM training.

Because of the potential for data breaches via trusted third parties, law firms should consider implementing a comprehensive and risk-based third-party risk management (TPRM) program. One of the key purposes of a modern-day TPRM program is to assess, understand, and monitor how trusted third parties protect themselves and data in their possession. Whenever sharing sensitive data with third parties, preference will be given to entities with mature data protection and information security management programs. The strength of such programs can be evidenced and partially validated by conformity with global technical standards and frameworks like ISO 27001 or SOC 2. A truly robust TPRM program should, however, go beyond superficial examination of entities’ certifications, instead meticulously inspecting their risk catalogues and cybersecurity policies and procedures, as well as auditing their compliance with these policies, and regularly reviewing a list of security incidents (including those that may not reach the level of a reportable data breach) with documentation of their aftermath and the response by the third party. Holistic implementation of TPRM will not only help mitigate AI-specific risks but also reduce a broad spectrum of technical risks and threats stemming from more conventional IT tools and solutions.

Another class of high-frequency and high-impact data risk for law firms is human error while using AI. According to Verizon’s 2024 Data Breach Investigations Report, as many as 68 percent of data breaches involved a nonmalicious human error. The current situation in the realm of AI is analogous: Many legal professionals working in law firms are still unaware of the broad and continually growing spectrum of risks created in their office environment by AI technologies. For instance, a paralegal may see nothing wrong in submitting a highly confidential memo to an online chat for a quick spell-check, trying to produce an impeccable document. Likewise, a busy associate may innocently upload a confidential brief to an online platform to get a cogent summary of the brief, trying to accomplish long list of tasks in a timely manner. This is why it is crucial to create, promulgate, and enforce a firm-wide AI use policy, which would specify permitted and prohibited ways to utilize AI in the workplace. Last but not least, ongoing training on risks, threats, and benefits of AI can serve as a powerful enhancement of such policy, which otherwise may simply gather dust on a bookshelf.

Another GenAI-related concern is that even publicly accessible data may be misused by GenAI vendors or their suppliers of training data. Some law firms generously share their expert knowledge, unique know-how, and analytical insights on corporate websites and blogs, providing high-quality articles or presentations on recent developments in the law. For legal technology AI vendors, such data is gold. Obviously, few authors would consent to have their work ingested by an LLM to be later exploited as part of a commercial product without any compensation or credit to the original content creator. However, valuable data can be vacuumed from trustworthy websites without notice through the common method of data scraping. The author has elaborated elsewhere on techniques for investigating and proving unauthorized data scraping in court, but prevention tends to be a better solution than an after-the-fact response.

Reviewing a law firm website’s terms of use is a sound starting point, as increasingly AI vendors—partially due to better self-regulation and partially due to emerging AI legislation, namely the EU AI Act and US state laws on AI—are starting to pay attention to terms of service. With the exception of some “good bots” like Google, automated scanning and crawling of the firm’s website should be prohibited, expressly banning data scraping for AI training. It may also be a good idea to add a liquidated damages clause if enforceable under applicable law. Next, a modern anti-bot protection, such as Cloudflare or a comprehensive web application firewall (WAF), can help protect the website from being crawled by malicious automated bots while ensuring a smooth experience for human visitors.

To summarize, though law firms face substantial risks to their data as technology evolves, protection of a law firm’s data in the GenAI era is not rocket science. While many of the foregoing challenges are bolstered by the rise of GenAI, ongoing attention to data risk management best practices provides corresponding solutions. Law firms should consider implementing and continuously improving the following instruments discussed above as part of a comprehensive and firm-wide data protection program:

  • Data inventory program
  • Data minimization strategy
  • Third-party risk management policy (TPRM)
  • Inventory of third parties with access to firm’s data
  • Inventory of third-party terms of service and data protection agreements
  • Cloud security and authorized cloud use policies
  • Authorized use of AI tools and solutions policy
  • Technical controls protecting firm’s public data
  • Terms of use protecting firm’s public data
By: Dr. Ilia Kolochenko

Connect with a global network of over 30,000 business law professionals

18264

Login or Registration Required

You need to be logged in to complete that action.

Register/Login