Skip to content Skip to footer

AI Companies Want to Colonize Our Data. Here’s How We Stop Them.

Artificial Intelligence companies are imposing a new “Doctrine of Discovery” on our digital commons, but we can resist.

A tech executive sucks up art (along with artists themselves) through a straw as he breaks the fourth wall to grab the article page itself for leverage. The industry argues that content mining is necessary in order to build AI tools that will supposedly “benefit” all of us.

In recent months, a number of novelists, artists and newspapers have sued generative artificial intelligence (AI) companies for taking a “free ride” on their content. These suits allege that the companies, which use that content to train their machine learning models, may be breaking copyright laws.

From the tech industry’s perspective, this content mining is necessary in order to build the AI tools that tech companies say will supposedly benefit all of us. In a recent statement to legislative bodies, OpenAI claimed that “it would be impossible to train today’s leading AI models without using copyrighted materials.” It remains to be seen if courts will agree, but it’s not looking good for content creators. In February, a California court dismissed large portions of a case brought by Sarah Silverman and other authors.

Some of these cases may reveal ongoing negotiations, as some companies figure out how to pressure others into sharing a piece of the AI pie. Publisher Axel Springer and the social media platform Reddit, for example, have recently made profitable deals to license their content to AI companies. Meanwhile, a legislative attempt in the United Kingdom that would have protected content generated by the creative industries has been abandoned.

But there is a larger social dilemma involved here that might not be as easy to detect: What about our content — content that we don’t usually associate with copyright laws, like emails, photos and videos uploaded to various platforms? There are no high-profile court cases around that. And yet, the appropriation of this content by generative AI reveals a monumental social and cultural transformation.

It’s easy to miss this transformation, because after all, this kind of content is considered a sort of commons that nobody owns. But the appropriation of this commons entails a kind of injustice and exploitation that we are still struggling to name, one not captured in the copyright cases. It’s a kind of injustice that we’ve seen before in history, whenever someone claims ownership of a resource because it was just there for the taking.

In the early phases of colonialism, colonizers such as the British claimed that Australia, the continent they recently “discovered,” was in legal terms “terra nullius” — no one’s land — even though it had been inhabited for millennia. This was known as the Doctrine of Discovery, a colonial version of “finders, keepers.”

Such claims have been echoed more recently by corporations wanting to treat our digital content and even our biometric data as a mere exhaust that’s just there to be exploited. The Doctrine of Discovery survives today in a seamless move from cheap land to cheap labor to cheap data, a phenomenon we call “data colonialism.” The word “colonialism” is not being used metaphorically here, but to describe a very real emerging social order based not on the extraction of natural resources or labor, but on the continuous appropriation of human life through data. Data colonialism helps us understand today’s transformations of social life as extensions of a long historical arc of dispossession. All of human culture becomes the raw material that is fed to a commercial AI machine from which huge profits are expected. Earlier this year, OpenAI began a fundraising round for $7 trillion, “more than the combined gross domestic products of the UK and France,” as the Financial Times put it.

What really matters is not so much whether generative AI’s outputs plagiarize the content of famous authors owned by powerful media groups. The real issue is a whole new model of profit-making that treats our lives in data form as its free input. This profitable data grab, of which generative AI is just an egregious example, is really part of a larger power struggle with an extensive history.

To challenge this, we need to go beyond the narrow lens of copyright law and recover a broader view of why extractivism, under the guise of discovery, is wrong. Today’s new — and so far largely uncontested — conversion of our lives and cultures into colonized data territories will define the relations between Big Tech and the rest of us for decades, if not centuries. Once a resource has been appropriated, it is almost impossible to claim it back, as evidenced by the fact that the Doctrine of Discovery is still cited in contemporary government decisions to deny Indigenous people rights over their lands.

As with land, so too with data. Do nothing, and we will count the costs of Big Tech’s Doctrine of Discovery for a long time to come.

Applying Historical Lessons in the Age of AI

Unfortunately, one-track approaches to confronting these problems, like quitting a particular social media platform, will not be enough. Since colonialism is a multifaceted problem with centuries of history, fighting back against its new manifestations will also require multifaceted solutions that borrow from a rich anti-colonial tradition.

The most important tool in this struggle is our imagination. Decolonizing data needs to become a creative and cultural movement. It is true that no colonized society has managed to decisively and permanently undo colonialism. But even when colonial power could not be resisted with the body, it could be resisted with the mind. Collective ingenuity will be our most valuable asset.

All of human culture becomes the raw material that is fed to a commercial AI machine from which huge profits are expected

In our recent book Data Grab: The New Colonialism of Big Tech and How to Fight Back, we outline a number of practical ways in which we can begin to apply this kind of creative energy. We borrow a model from Latin American and Latine activists, who encourage us to act simultaneously across three different levels: within the system, against the system and beyond the system. Limiting ourselves to only one of these levels will not be enough.

What might this look like in practice? Working within the system might mean continuing to push our governments to do what they have so far largely failed to do: Regulate Big Tech by passing anti-trust laws, consumer protection laws and laws that protect our cultural work and heritage. It might seem tempting to want to abandon mainstream politics, but doing so would be counterproductive in the long term.

But we cannot wait for the system to fix itself. This means we need to work against the system, embracing the politics and aesthetics of resistance as decolonial movements have done for centuries. There are plenty of inspiring examples, including those involving unionization, workers’ rights, Indigenous data sovereignty, environmental organizing, and movements against the use of data technologies to carry out wars, surveillance, apartheid and the persecution of migrants.

Finally, we need to think beyond the system, building ways of limiting data exploitation and redirecting the use of data toward more social, democratic goals. This is perhaps the most difficult but most important task. It will require new technologies as well as new ways of rejecting technology. A large collective and imaginative effort is needed to resist data colonialism’s new injustices. This effort is a crucial step on the longer journey to confronting and reversing colonialism itself.

We have 9 days to raise $50,000 — we’re counting on your support!

For those who care about justice, liberation and even the very survival of our species, we must remember our power to take action.

We won’t pretend it’s the only thing you can or should do, but one small step is to pitch in to support Truthout — as one of the last remaining truly independent, nonprofit, reader-funded news platforms, your gift will help keep the facts flowing freely.