Data as Labor

An analysis of the economic function of data

The central thesis of this publication is that data should be considered property, though only in the colloquial sense that individuals should have a recognized claim over the data they produce. However, for the purposes of legal definitions and treatments of data, it is useful to compare it to labor and property and their corresponding legal precedents.

This essay will expand on ideas from Is Data Labor? by Julian Jonker and Data as Commodity by Sam Popowich, both of which provide excellent analyses of the relationship between data and the individual.

Note: this essay refers to "data" under the assumption of personal data, ignoring other forms of data — e.g., weather reports.

I will approach the treatment of data from three angles — as labor, capital, and commodity — and compare how intuitive each is as a classification for data. By the end, we will see that data is not quite each of them, but some mix.

Commodity

Nearly a decade ago, major news media began publishing the headline “Data is the New Oil.” This was meant to capture the idea that data was the newest and most valuable resource to enter the economy. While the intuitive comparison is easy to digest, there are a few key differences between data and oil.

First, oil is a limited, non-renewable resource, while data is practically limitless. As such, the laws of supply and demand would dictate that the price of data should be zero, yet that is clearly not the case when we observe the modern economy. Second, data is not an exhaustible resource like oil or wood; it is more like air or dirt. Data is sometimes considered non-rival in that one party using it does not prevent another party from doing the same; however, this is not always the case, as elaborated later. Finally, data exhibits network effects that resources do not: 100 barrels of oil have 100 times the value of a single barrel, while 100 points of data are far more valuable than 100 times that of a single datum.

The result of these differences is that it is difficult to view data properly as a resource; rather, it more closely aligns with capital due to its longevity, ubiquity, and network properties. Suffice to say, viewing data as a commodity or resource only makes sense on a surface level, and while it makes for an easy economic argument (that platforms are simply gathering the data as a resource), this view ignores the more intertwined relationship between the individual and his or her data.

On the non-rivalry of data. Data is often considered a non-rival asset, in that using one dataset to train an AI model does not destroy the value of the dataset to others, and in fact does not destroy the data at all, creating an “infinite” amount of data. While this is generally true, there are some caveats to consider that are becoming more and more prevalent as AI models improve. Broadly, there is a corporate effort to privatize or otherwise restrict access to high-quality data, making it a competitive asset, though in principle this doesn’t affect the value of the data itself.

More recently, however, there are conversations surrounding the supposed infinite supply of data. As new models require fresher, higher-quality data, this creates a “saturation” effect that bounds the marginal value of a data point. The older a dataset is, the less valuable it is to new models, which have already encoded the relevant behaviors within the model; this creates a sort of self-rivalry where each firm can use a dataset a limited number of times, but unlimited competitors can use the same data without exhaustion. All this to say that data has some unique qualities that separate it from the class of economic objects known as “commodities.”

Capital

If data is not a good, is it instead capital? While the network effects it experiences would usually point to this, it differs greatly from other forms of capital when considering its relationship to the final product.

Consider a textile factory and its machines: the production of cotton fabric occurs within these bits of capital (the looms) without changing them, while the output is necessarily a transformation of the input (the cotton). In an AI datacenter, the electronics, server racks, and processors clearly constitute capital, and the output (the AI model) is definitively a transformation of the data. However, the data itself remains in its original state, thanks to its intangibility. Jonker posits that, because of this, data is more like human capital, where the benefit of the data is similar to the benefit of a worker’s skill. However, data can be separated from the human who produced it, unlike skills and knowledge.

Thus, if we equate data directly to capital and consider it as some kind of digital machine, we risk erasing the fact that data originates from users. This enables an extortive claim that users are not entitled to recognition or compensation for their contributions, since the data is treated as an autonomous asset rather than a trace of human activity. In doing so, we lose the inherent connection between users and the data they produce.

This leaves us with an incomplete picture of data in its relationship to the economy. It is certainly not a commodity, and while it shares features with capital, treating it as such ignores its human origin. Data is more relatable as an input to a business than as the means of production itself, which leaves the remaining factor of production, labor.

Labor

While classifying data as a form of labor is counterintuitive, it is the remaining economic object that we can compare to in hopes of deriving some legal precedent for ownership. Labor, like data, originates in human activity, carries the imprint of individual contribution, and raises natural questions of ownership and compensation. Beyond these shared traits, data also creates a similar dynamic between the user and platform — that is, worker and owner. Per Jonker, data is most like labor in that:

  • Nearly everyone prefers compensation over free contributions,
  • Buyers of data have a systemic bargaining advantage over individual sellers of data,
  • The terms of authority between the platform and user are open-ended and therefore open to abuse. For example, a platform may use your data for political advertising without your knowledge; this is un- or under-specified in the terms and conditions to allow the platform to adapt to new business conditions.

Clearly, data occupies the same socioeconomic niche as labor, even though the “physical” manifestation differs. Still, there are some differences. Labor is best understood as a process, while data is a thing. Labor is limited and rival, while data is portable and often non-rival.

Conclusion

We are left at a crossroads: recognizing data as capital ignores the instinctual claim over our own behavior and privacy, while recognizing data as labor captures political aspects but fails to account for the fact that data is an object.

As such, perhaps we can stake out a new treatment of data, wherein it is indeed capital but necessarily an outcome of some human effort. That is, it is a distinct, transferable economic entity with a utility and an exchange value, but its principal first owner is the individual from which it is derived. Under this treatment, we can recognize the moral claim an individual has over their “labor,” without degrading the utility of data by removing its transferability.

To operationalize this, it would be prudent to set up the organizational frameworks described elsewhere in this publication — that is, data cooperatives — both to serve as a legal precedent for data ownership and to empower the sovereign individual.