James Kwak | Bad Data

By: James Kwak ,; TheBaselineScenario|Op-Ed

Published: February 14, 2011

To make a vast generalization, we live in a society where quantitative data are becoming more and more important. Some of this is because of the vast increase in the availability of data, which is itself largely due to computers. Some is because of the vast increase in the capacity to process data, which is also largely due to computers. Think about Hans Rosling’s TED Talks, or the rise of sabermetrics (the “Moneyball” phenomenon) not only in baseball but in many other sports, or the importance of standardized testing scores in K-12 education, or Karl Rove’s usage of data mining to identify likely supporters, or the FiveThirtyEight revolution in electoral forecasting, or the quantification of the financial markets, or zillions of other examples. I believe one of my professors has written a book about this phenomenon.

But this comes with a problem. The problem is that we do not currently collect and scrub good enough data to support this recent fascination with numbers, and on top of that our brains are not wired to understand data. And if you have a lot riding on bad data that is poorly understood, then people will distort the data or find other ways to game the system to their advantage.

Readers of this blog will all be familiar with the phenomenon of rating subprime mortgage-backed securities and their structured offspring using data exclusively from a period of rising house prices — because those were the only data that were available. But the same issue crops up in many different stories covering different aspects of society.

CompStat, an approach to policing that focuses on tracking detailed crime metrics, was widely credited with helping New York and other cities reduce crime in the 1990s. Last year, This American Life ran a story, based on a police officer’s secret recordings, detailing how in at least one precinct officers were pressured to boost their numbers through dubious arrests and citations. They also found another precinct where serious crimes were reported as less serious crimes in order to make their numbers look better than they really were.

In a recent New York Times story, David Segal describes how law schools massage their metrics to score higher in the US News and World Report rankings. Segal focuses on the tricks that some schools seem to use to boost the number of graduates employed nine months after graduation; for example, some schools apparently hire their own graduates to temporary positions that happen to span the date on which employment rates are measured. The rankings are based on statistics that are defined by the American Bar Association but are self-reported by the schools and not audited by anyone.

At a time when it’s often tough to tell the difference between the corporate news and its advertisements, it’s essential to keep independent journalism strong. Support Truthout today by clicking here.

The big, well-known example of how the importance of data breeds data manipulation is standardized testing. In the early days of the standardized testing boom, the key statistic was the percentage of students at or above grade level, defined as the fiftieth percentile on some standardized test. (For those wondering if this is circular, the scaled score required to be at the fiftieth percentile is set before the test based on the attributes of the questions included in the test; it is not set after the test based on students’ actual performances.) So one obvious tactic would be to focus on students in roughly the thirtieth to sixtieth percentiles while ignoring the others. Another, more problematic tactic would be to classify as many low-performing students as possible into special education so that they would not be in the denominator. (Then there is blatant cheating, like giving your students more time to take the test or simply correcting their answers afterward — Freakonomics has a chapter on this – since few if any school districts have the capacity or the motivation to oversee the tests rigorously.) Even leaving aside data manipulation issues, there is also the basic problem that test difficulty varies from year to year. The test in year N + 1 is calibrated to be the same difficulty as the test in year N, but this is all based on statistics, and there is this thing called random variation to deal with.

And I recently read Natalie Obiko Pearson’s story in Bloomberg on the problems with greenhouse gas emissions data. Most of the numbers we read are self-reported by countries and the companies in those countries, and even if they are honest (a big if) they are “bottom up” estimates — based on how much fossil fuel is being consumed. But when scientists actually measure changes in greenhouse gases in the atmosphere, they get different results than predicted by the bottom-up estimates. And in all the examples cited in Bloomberg, actual atmospheric measurements are higher than bottom-up estimates. This could be because the article didn’t mention atmospheric measurements that were lower than predicted by official data. But it could also be because both the companies burning the fossil fuels and the countries aggregating the data have the same incentive to underreport: companies because it means they don’t have to buy as many carbon permits and countries because it means they can claim to be under their Kyoto Protocol targets.

Greenhouse gases are a good example of how we think data will help save us — if we can track how much carbon dioxide each company is producing, we can make it pay for that carbon — but we may just not have good enough data. In general, I think the current trend toward using more and more data is a good thing. I mean, what’s the alternative: gut intuition? But this only increases the importance of having good data to begin with. And when some parties benefit from bad data, this can be a big challenge with no easy solution.

We have 10 days to raise $50,000 — we’re counting on your support!

For those who care about justice, liberation and even the very survival of our species, we must remember our power to take action.

We won’t pretend it’s the only thing you can or should do, but one small step is to pitch in to support Truthout — as one of the last remaining truly independent, nonprofit, reader-funded news platforms, your gift will help keep the facts flowing freely.

Latest Stories

News

Human Rights

Oxfam Report: Conditions “Ripe” for Cholera, Hepatitis A Outbreak in Gaza

The water and sanitation systems are close to “total collapse,” an official warned.

By: Sharon Zhang ,; Truthout

News

Human Rights

Army Major and Pentagon Officer Resigns Over US Support of Gaza Genocide

“At some point … you’re either advancing a policy that enables the mass starvation of children, or you’re not,” he said.

By: Sharon Zhang ,; Truthout

News

LGBTQ Rights

Trump Says He Will Reverse Protections for LGBTQ Kids on “Day One” If Reelected

“Trump is currently the greatest threat to LGBTQ people,” LGBTQ legislative researcher Allison Chapman told Truthout.

By: Zane McNeill ,; Truthout

News

Politics & Elections

A New Georgia Law Makes It Easier to Challenge Voter Registration

Conservative organizations have been vocal about their plans to file many challenges to voter registrations this year.

By: Doug Bock Clark ,; ProPublica

News

Human Rights

Israel Orders Evacuations in North and South Gaza as Deaths Top 35,000

Israeli forces are intensifying their campaigns across the strip, unleashing what some say are the worst bombings yet.

By: Sharon Zhang ,; Truthout

Interview

Prisons & Policing

NYPD Killed 19-Year-Old Win Rozario After He Called 911 for Help

The fatal shooting of the Bangladeshi teen has set off protests and demands for justice from the family.

By: Amy Goodman ,; DemocracyNow!

News

War & Peace

Egypt Joins ICJ Case Accusing Israel of Genocide as Gaza Assault Intensifies

Egypt is the third country to request to join South Africa's case, and its request is especially significant for Israel.

By: Olivia Rosane ,; CommonDreams

News

Politics & Elections

Appeal Imminent After NY Judge Blocks Equal Rights Amendment From Ballot

The amendment effort "will not be thrown off track by one extremist judge," Gov. Kathy Hochul said.

By: Chris Walker ,; Truthout

Interview

War & Peace

Israel Continues Unfettered Colonization of the West Bank Amid Genocide in Gaza

The West Bank has posed the biggest challenge to the Zionist settler movement's pursuit of a “Greater Israel.”

By: Daniel Falcone ,; Truthout

News

War & Peace

Bernie Sanders: Israel Has Gone to War Against the Entire Palestinian People

As Israel escalated its assault on Gaza over the weekend, Sanders said Israel shouldn’t receive any more military aid.

By: Olivia Rosane ,; CommonDreams

Sections

Latest

Oxfam Report: Conditions “Ripe” for Cholera, Hepatitis A Outbreak in Gaza

A New Georgia Law Makes It Easier to Challenge Voter Registration

NYPD Killed 19-Year-Old Win Rozario After He Called 911 for Help

Egypt Joins ICJ Case Accusing Israel of Genocide as Gaza Assault Intensifies

More

James Kwak | Bad Data

We have 10 days to raise $50,000 — we’re counting on your support!

Menu

We have 10 days to raise $50,000 — we’re counting on your support!