Skip to content Skip to footer

AI Companies Want to Colonize Our Data. Here’s How We Stop Them.

Artificial Intelligence companies are imposing a new “Doctrine of Discovery” on our digital commons, but we can resist.

A tech executive sucks up art (along with artists themselves) through a straw as he breaks the fourth wall to grab the article page itself for leverage. The industry argues that content mining is necessary in order to build AI tools that will supposedly “benefit” all of us.

In recent months, a number of novelists, artists and newspapers have sued generative artificial intelligence (AI) companies for taking a “free ride” on their content. These suits allege that the companies, which use that content to train their machine learning models, may be breaking copyright laws.

From the tech industry’s perspective, this content mining is necessary in order to build the AI tools that tech companies say will supposedly benefit all of us. In a recent statement to legislative bodies, OpenAI claimed that “it would be impossible to train today’s leading AI models without using copyrighted materials.” It remains to be seen if courts will agree, but it’s not looking good for content creators. In February, a California court dismissed large portions of a case brought by Sarah Silverman and other authors.

Some of these cases may reveal ongoing negotiations, as some companies figure out how to pressure others into sharing a piece of the AI pie. Publisher Axel Springer and the social media platform Reddit, for example, have recently made profitable deals to license their content to AI companies. Meanwhile, a legislative attempt in the United Kingdom that would have protected content generated by the creative industries has been abandoned.

But there is a larger social dilemma involved here that might not be as easy to detect: What about our content — content that we don’t usually associate with copyright laws, like emails, photos and videos uploaded to various platforms? There are no high-profile court cases around that. And yet, the appropriation of this content by generative AI reveals a monumental social and cultural transformation.

It’s easy to miss this transformation, because after all, this kind of content is considered a sort of commons that nobody owns. But the appropriation of this commons entails a kind of injustice and exploitation that we are still struggling to name, one not captured in the copyright cases. It’s a kind of injustice that we’ve seen before in history, whenever someone claims ownership of a resource because it was just there for the taking.

In the early phases of colonialism, colonizers such as the British claimed that Australia, the continent they recently “discovered,” was in legal terms “terra nullius” — no one’s land — even though it had been inhabited for millennia. This was known as the Doctrine of Discovery, a colonial version of “finders, keepers.”

Such claims have been echoed more recently by corporations wanting to treat our digital content and even our biometric data as a mere exhaust that’s just there to be exploited. The Doctrine of Discovery survives today in a seamless move from cheap land to cheap labor to cheap data, a phenomenon we call “data colonialism.” The word “colonialism” is not being used metaphorically here, but to describe a very real emerging social order based not on the extraction of natural resources or labor, but on the continuous appropriation of human life through data. Data colonialism helps us understand today’s transformations of social life as extensions of a long historical arc of dispossession. All of human culture becomes the raw material that is fed to a commercial AI machine from which huge profits are expected. Earlier this year, OpenAI began a fundraising round for $7 trillion, “more than the combined gross domestic products of the UK and France,” as the Financial Times put it.

What really matters is not so much whether generative AI’s outputs plagiarize the content of famous authors owned by powerful media groups. The real issue is a whole new model of profit-making that treats our lives in data form as its free input. This profitable data grab, of which generative AI is just an egregious example, is really part of a larger power struggle with an extensive history.

To challenge this, we need to go beyond the narrow lens of copyright law and recover a broader view of why extractivism, under the guise of discovery, is wrong. Today’s new — and so far largely uncontested — conversion of our lives and cultures into colonized data territories will define the relations between Big Tech and the rest of us for decades, if not centuries. Once a resource has been appropriated, it is almost impossible to claim it back, as evidenced by the fact that the Doctrine of Discovery is still cited in contemporary government decisions to deny Indigenous people rights over their lands.

As with land, so too with data. Do nothing, and we will count the costs of Big Tech’s Doctrine of Discovery for a long time to come.

Applying Historical Lessons in the Age of AI

Unfortunately, one-track approaches to confronting these problems, like quitting a particular social media platform, will not be enough. Since colonialism is a multifaceted problem with centuries of history, fighting back against its new manifestations will also require multifaceted solutions that borrow from a rich anti-colonial tradition.

The most important tool in this struggle is our imagination. Decolonizing data needs to become a creative and cultural movement. It is true that no colonized society has managed to decisively and permanently undo colonialism. But even when colonial power could not be resisted with the body, it could be resisted with the mind. Collective ingenuity will be our most valuable asset.

All of human culture becomes the raw material that is fed to a commercial AI machine from which huge profits are expected

In our recent book Data Grab: The New Colonialism of Big Tech and How to Fight Back, we outline a number of practical ways in which we can begin to apply this kind of creative energy. We borrow a model from Latin American and Latine activists, who encourage us to act simultaneously across three different levels: within the system, against the system and beyond the system. Limiting ourselves to only one of these levels will not be enough.

What might this look like in practice? Working within the system might mean continuing to push our governments to do what they have so far largely failed to do: Regulate Big Tech by passing anti-trust laws, consumer protection laws and laws that protect our cultural work and heritage. It might seem tempting to want to abandon mainstream politics, but doing so would be counterproductive in the long term.

But we cannot wait for the system to fix itself. This means we need to work against the system, embracing the politics and aesthetics of resistance as decolonial movements have done for centuries. There are plenty of inspiring examples, including those involving unionization, workers’ rights, Indigenous data sovereignty, environmental organizing, and movements against the use of data technologies to carry out wars, surveillance, apartheid and the persecution of migrants.

Finally, we need to think beyond the system, building ways of limiting data exploitation and redirecting the use of data toward more social, democratic goals. This is perhaps the most difficult but most important task. It will require new technologies as well as new ways of rejecting technology. A large collective and imaginative effort is needed to resist data colonialism’s new injustices. This effort is a crucial step on the longer journey to confronting and reversing colonialism itself.

Join us in defending the truth before it’s too late

The future of independent journalism is uncertain, and the consequences of losing it are too grave to ignore. To ensure Truthout remains safe, strong, and free, we need to raise $50,000 in the next 10 days. Every dollar raised goes directly toward the costs of producing news you can trust.

Please give what you can — because by supporting us with a tax-deductible donation, you’re not just preserving a source of news, you’re helping to safeguard what’s left of our democracy.