The information universe is expanding in truly mind-numbing ways. There is a new exabyte of data created every few hours across the globe. (One exabyte of data is the equivalent of 50,000 years of continuous movies.) That Mount-Everest-sized pile of information is replicated many times every day and continues to grow faster and faster. Big companies typically have millions or billions of files stored in multiple locations, including third-party-owned Clouds. For many companies, that means they can’t keep all their information forever because they are collapsing under its weight. So why are companies hard-pressed to clean house of unneeded information?
Companies historically had records-management programs so that they could manage records and properly dispose of them in accordance with the company retention policy at some future date. At the time, making retention rules work meant that employees had to apply the rules to their records. That was simple when each employee boxed their paper records annually and applied a retention rule to each box. However, having employees apply business rules to millions or billions of files from various systems is like drinking from a firehose through a straw. In other words, cleaning house according to the retention schedule applied to each record one by one for most businesses is no longer doable.
The current business environment is like information’s “perfect storm”—more data in more formats and systems with less visibility into what information assets exist, more laws directing how it must be managed, more consequences for mismanagement, and more challenges in managing it according to old company rules with much of it floating in a Cloud.
Why Does Information Just Pile Up?
Companies relied for years on paper and electronic information, sometimes duplicating each other over and over. Although electronic information legally is on par with its paper counterparts for almost all purposes, lawyers fallaciously believed paper was the “best evidence,” and thus the two piles grew even though paper printouts of electronic records could be legally destroyed.
Today, much of the growth in information volumes comes from communications, social media, and collaboration technologies the output of which may not rise to the level of a company record. Thus, the pile grows further with information that may be “nonrecord,” which need not be retained to satisfy legal or business needs.
While litigants began to focus on electronic information for discovery purposes, sometimes company lawyers over-preserved information so as not to worry about its destruction during the pendency of a matter. What that set in motion was everything, even information ready for destruction pursuant to the retention rules, continuing to be preserved. Wide-sweeping legal holds that took precedence over retention rules stopped the proper destruction of records in the ordinary course of business according to company policy. Thus, the pile grew larger still because employees couldn’t classify and/or manage the growing amount of information, given that the sheer volume of files, documents, and e-mail became overwhelming.
Compounding matters, there was a need to manage information according to other information-related policy regimes, like information security, privacy, attorney-client privilege, etc., which often impacted the same information. Further compounding the problem was that information classification couldn’t be easily accomplished given limited functionality in most technology unless information was being purposefully stored in document and records-management applications. In other words, if employees were so inclined (and they generally weren’t), most technology in use didn’t allow for such compliance rules to be easily applied or applied at all. Thus, the pile grew.
The fallacious belief that storage is cheap further impacted storage growth. Although storage costs per terabyte are decreasing a few percentage points, any cost savings are dwarfed by company information footprints doubling every year or two, and with storage costs between $5–$10 million per year, per petabyte, storage costs are now huge for companies with big information footprints. Thus, the pile grew larger.
Then Big Data happened. Big Data is not about large piles of information. It’s about using analytics or artificial intelligence (AI) software to crawl through large piles of information to answer a business question. Suddenly there was even less pressure to clean house. Business folks want more information for longer periods of time to run queries and see what they learn from a business perspective.
In 2018 the tide seems to be turning in that less information may be retained given significant compliance events. First, with endless information security and privacy failures, companies realize their risk profile declines with smaller information footprints, which can be accomplished by keeping less and for a shorter period. As Jeff Stone, et. al put it in a May 29, 2018 article in the Wall Street Journal:
Cybersecurity threats are relentless, they’re getting stronger and they are coming from more directions than ever . . . more, the consequences of a breach can be disastrous, with staggering losses of customer data and corporate secrets—followed by huge costs to strengthen security, as well as the threat of regulatory scrutiny and lawsuits.
Further, the EU’s General Data Protection Regulation (GDPR) became law and is forcing companies to rethink what information they keep and for how long because GDPR requires it.
What Is Defensible Disposition and How Will It Help?
A solution to the unmitigated data sprawl is to “defensibly dispose” the business content that no longer has business or legal value to the organization. Defensible disposition is a way to take on piles of information without employees classifying .
To apply a retention rule to large chunks of information to make a business decision to dispose of it requires different diligence depending upon the content; thus, there is no one-size-fits-all approach to defensible disposition. In some cases, a software analytics tool may need to crawl the contents looking for specific terms, and in others, knowing the age of the information pile, the business unit that created it, the lack of active litigation, and so on might be enough to purge the entire contents without looking at each file. Having worked with so many companies cleaning up stored information, determining the amount of diligence needed in analyzing information piles to make a company comfortable to purge is rather variable.
In any event, lawyers’ input will be essential to help define a reasonable diligence process to assess the legal requirements for continued information retention and/or preservation, based on the information at issue. Thereafter, lawyers can also help select a practical information assessment and/or classification approach, given information volumes, available resources, and risk profile.
Does Litigation Profile Matter?
A good time to clean up outdated information is when there are fewer legal or compliance issues that require continued preservation of information. Disposing of information when no litigation or government investigations or audits exist is less risky. Otherwise, before information can be purged, the company must conduct sufficient diligence to ensure that nothing is destroyed that will give rise to a spoliation claim. That, of course, begs the questions of how diligence will be performed when it’s impractical to review millions or billions of files or documents.
Can Technology Help?
There are all kinds of analytics and classification technologies that can help analyze information and may help with defensible disposition; however, having used these technologies for years to help companies deal with dead data, the expense and/or complexity should not be underestimated. Putting aside cost, these technologies are better and faster than employees at classifying information. As Maura R Grossman, JD, Ph.D., et. al described in the Richmond Journal of Law and Technology, “[t]his work presents evidence supporting the contrary position: that a technology-assisted process, in which only a small fraction of the document collection is ever examined by humans, can yield higher recall and/or precision than an exhaustive manual review process, in which the entire document collection is examined and coded by humans.”
Studies and courts make clear that when appropriate, companies should not fear using technologies to help manage information. For example, in Moore v. Publicis Groupe, Judge Andrew Peck made clear in the discovery context that “[c]omputer-assisted review appears to be better than the available alternatives, and thus should be used in appropriate cases . . ounsel no longer have to worry about being the “first” or “guinea pig” for judicial acceptance of computer assisted review.”
Can I Clean House with Methodology Alone?
If information has piled up and you don’t think it makes sense to crawl it for records or preservation obligations, then there are other ways to get rid of content.
For example, if your company has 100,000 back-up tapes from 20 years ago, minimal review might be required before the whole lot of tapes can be comfortably disposed. On the other hand, if you have an active shared drive with records and information that is needed for ongoing litigation, there must be deeper analysis with analytics and/or classification technologies. In other words, the facts surrounding the information will help inform whether the information can be properly disposed with minimal analysis or whether it requires deep diligence.
Defensible disposition is needed like never before, given that information is growing unfettered for most businesses and impacting their ability to function. In addition, a bloated information footprint further increases a business’s privacy and information security risk profile. Although there are many reasons why retention is no longer happening as it used to, what is clear is that keeping everything forever is not without great costs or risks that must be addressed. In the end, lawyers must find a way to get rid of information without creating greater business and legal issues for their clients. Without their guidance, no one will destroy data, and it will continue to overwhelm.