Imagine being denied a car insurance policy because an algorithm based on your publicly available social media posts determined that you were not a safe driver. This example highlights one of a number of difficult legal and ethical issues surrounding the data that is generated from processing of consumer data. The prediction about whether you are a safe driver is often referred to as “inferred data” to separate it from data that companies collect directly from users or indirectly from external sensors and sources (such as their public social media feed). But what exactly is inferred data, and why is it the subject of so much debate?
Users directly provide data about themselves (and others), and sensors indirectly provide data about people, often without their direct involvement (or consent). Companies and governments perform analytics on direct data to infer other characteristics of the data subjects. Inferred data is the result of this processing. There has been a huge growth of law regulating data provided directly by people and indirectly from sensors. Now, however, regulators and other legal scholars are beginning to ask whether and to what extent privacy laws and intellectual property laws ought to apply to inferred data.
It is important to understand how inferred data is created in order to ferret out whether it is the type of personal information that should be subject to different legal theories. We can think of data analytics as an input-output process in which people or sensors provide data that is analyzed to create a new output (data) that was inferred from that analysis. This analysis often occurs not based on a single input, but on large datasets using techniques like regression, classification, clustering, and machine learning. For ease of understanding, think of the process like cooking a meal. The input data is the ingredients, the analytical method is the recipe, and the actual analysis the cooking. The output—the inferred data—can be thought of as the final, cooked meal.
It is also important to understand how the analysis process works to set expectations for characterizing inferred data under different legal regimes. In one type of analysis process called machine learning, for example, an algorithm is first used to generate a machine learning model by training the algorithm with a large dataset. This training process results in a model that has a unique set of variables and weighting factors. Using the cooking analogy, you can think of the algorithm as the recipe and the model as a recipe variation that creates different outputs—like New York pizza versus Chicago pizza. Then, the machine learning model is used to analyze new input data to create inferences based on that input data. The same input data could result in different inferences depending on the version of the machine learning model used and how it was trained. This means that inferred data is often more like a prediction than a result. A model that was trained with Chicago pizza training data might interpret a set of underlying ingredients like flour, mozzarella cheese, tomato sauce, and mushrooms to predict a deep-dish pizza, while a model that was trained with New York pizza training data might predict a thin-crust pizza from the same underlying ingredients. Because of the potential for variations of models, the accuracy of inferred data is the subject of much debate.
Inferred Data as Intellectual Property
Companies have argued that inferred data is the knowledge a company generates from its processing activities and, therefore, is intellectual property that is owned by the creating company. However, inferred data can also be thought of as new data that is the output of processing. This difference is critical to intellectual property law, where data is not generally an appropriate subject for copyright or any other intellectual property protection. Under current U.S. copyright and U.S. patent law, a work can only be an original work of authorship or invention respectively if it was created by a human being. Thus, when an AI system generates an output result and the computer is considered the author or inventor of that result rather than a human being, the output cannot be given copyright or patent IP protection. However, not all countries share this view. In the United Kingdom, for example, the author of the AI program may be eligible for the copyright on the output of AI.
As a result of the current U.S. stance on IP protection for inferred data, the creators of inferred data often argue that this data is their trade secret and therefore owned by them. An analysis of whether inferred data can be considered as a trade secret would need to be done under each state’s version of the Uniform Trade Secrets Act, which states that a trade secret is essentially information that derives independent economic value from not being generally known to the public or others who might use it for economic advantage. Ownership of the inferred data would then remain with the entity that takes steps to protect it. In addition, companies that create inferred data have suggested that because it is a trade secret, it also should not be subject to privacy and other laws that would force disclosure of that information.
Inferred Data as New Data
Proponents of inferred data as new data argue that if the source data was covered by privacy laws, then this new data ought to also be covered by the same regulations as the base data from which it is derived, regardless of its IP designation. They argue that the purpose of data protection and privacy laws is to protect consumers from the misuse or publication of their personal information, and that this purpose applies as much to personal information that results from an analytics process as it does to personal information that is directly obtained. However, this means that inferred data may need to be evaluated based on the context of its use and how it is generated to determine whether that use triggers the protections that data protection and privacy laws offer.
How Inferred Data is Used Matters
Inferred data could be used to optimize internal business processes, in which case it may not have any relevance to consumers. But when inferred data is used to profile a person, it may have serious implications to that person. Because inferred data often represents predictions and not facts, the potential for harm may be greater than data provided by the person directly. In the context of profiling of individuals by identifying or predicting sensitive information about them, privacy regulations that intend to protect consumers would seem to be applicable to the inferred data. Similarly, when creditworthiness or likelihood of flight before trial are the predictions that are inferred, other consumer protection regulations would seem to apply strongly. It is important to note that there are laws that allow the input data to be corrected, such as reported credit data. But models could still produce a biased or unfair prediction based even on corrected inputs.
Furthermore, predictions based on machine learning models can be difficult to assess for accuracy because these models are trained and are often dependent on the input training dataset used to generate them. These models act like black boxes, where it is nearly impossible to understand how the unique variables and weighting factors were created. As a result, interpreting or correcting a prediction that is false or biased can be very difficult. Worse yet, these mistakes are difficult to litigate because the model cannot be cross-examined in court. Many privacy regulations, including the General Data Protection Regulation (GDPR) and the CCPA (California Privacy Protection Act) provide for a consumer right to correct data. If inferred data is subject to privacy regulations, this right of correction could be very difficult to apply.
In California, the attorney general has recently issued an opinion on the interpretation of inferred data under the CCPA. Specifically, the attorney general was asked whether a consumer’s right of access under the CCPA encompasses inferred data that a business has created about them. As is so often the case with a legal question, the answer is “it depends.” The attorney general determined that inferred data was within the definition of “personal information” under the CCPA only if it met two requirements. First, the inferred data must have been generated from specific categories of data that are identified in the statute regardless of whether that information was private or public, and regardless of how it was obtained. Second, the inferred data must have been generated for the purpose of creating a profile about a consumer that reflected their “preferences, characteristics, psychological trends, predispositions, behavior, attitudes, intelligence, abilities, and aptitudes.”
How Inferred Data is Generated Matters
Inferred data may be subject to privacy rules not only based on how it is used, but also based on how it is generated. For instance, the Federal Trade Commission (FTC) has seemed to determine through recent decisions that inferred data is sufficiently tied to the processing of input source data, even for training purposes, that if the processing is tainted by fraud, the machine learning algorithms and models that process that tainted data are also tainted, as well as any inferred data that results from the processing of that input data. In one recent decision, EverAlbum was accused of collecting input data without proper consent for its use in training a facial recognition algorithm. As part of the decision, the FTC required EverAlbum to delete the machine learning model that was trained with the faces, as well as the algorithm used to create the model, and the output data created by processing of new facial images by that tainted model. Thus, inferred data that was generated by fraud or misrepresentation was the result of misuse and protected by consumer protection laws.
In summary, inferred data is widely agreed to be data that is the output of processing, rather than data that is provided directly or indirectly from a person. That may be where the agreement ends. Issues of to what extent inferred data is subject to privacy regulations and whether inferred data can be treated as intellectual property are still undecided, as are issues of automated decision-making based on inferred data. These issues will, in all likelihood, be the subject of much discussion as the amount and uses of inferred data continue to grow. For companies whose business models depend on their ability to generate and use inferred data, the outcome of these discussions could be critical to their future.
Graham Ruddick, “Admiral to Price Car Insurance Based on Facebook Posts,” The Guardian, Nov. 1, 2016, https://www.theguardian.com/technology/2016/nov/02/admiral-to-price-car-insurance-based-on-facebook-posts. ↑
See Compendium of U.S. Copyright Office Practices, Chapter 300, pg. 307, 3rd ed., Jan. 28, 2021 (https://copyright.gov/comp3/chap300/ch300-copyrightable-authorship.pdf); see also, Naruto v. Slater, 888 F.3d 418 (9th. Cir. 2018). See also, Thaler v. Hirshfield, 2021 WL 3934803 (E.D. Va. Sep. 2, 2021) and “AI Machine Is Not an ‘Inventor’ Under the Patent Act: E.D. Va.,” Legal Update (https://us.practicallaw.thomsonreuters.com/w-032-5362). ↑
Uniform Trade Secrets Act With 1985 Amendments. ↑
Cal. Op. Att’y. Gen. No. 20-303 (Cal. A.G.). ↑
Civ. Code, S 1798.140, subd. (o)(1)(A)-(K). ↑
Civ. Code, S 1798.140, subd. (o)(1). ↑
Everalbum, Inc., Docket No. C-4743, Decision and Order, F.T.C. (2021). ↑