The Census Bureau is rolling out a new algorithm intended to protect respondents’ privacy — but experts warn the change will significantly miscount minority communities and rural areas.
Specifically, the Census Bureau plans to use a new “differential privacy” algorithm to obscure respondents’ identities, yet state experts warn that the data could result in population errors of 25 percent or more and misrepresent certain groups by 100 percent or more. This would have dramatic results on redistricting and funding.
The data released by the bureau is expected to be accurate on the state level but its sub-state level data — region, county, city, town — will be intentionally distorted. In the past, the bureau used “data swapping” to ensure individuals in small populations were not identifiable by certain statistics by aggregating their data with similar individuals while keeping the population totals accurate, according to the National Council of State Legislatures (NCSL). But concerns that the data could be cross-referenced with other information that could make individuals identifiable led the bureau to implement a “differential privacy” algorithm that will “inject noise” into the raw data.
Though the bureau is still working out how it will implement this, the move immediately raised concerns.
“Differential privacy will mean that, except at the state level, population and voting age population will not be reported as enumerated. And, race and ethnicity data are likely to be farther from the ‘as enumerated’ data than in past decades, when data swapping was used to protect small populations,” according to the NCSL. “This may raise issues for racial block voting analyses.”
The bureau released a demonstration to states to test out the new method using data from the 2010 census and experts quickly realized that the data was very different from the original 2010 numbers, particularly in rural areas.
Meredith Strohm Gunter of the Weldon Cooper Center for Public Service at the University of Virginia warned Gov. Ralph Northam in a January memo that data on the sub-state level “will be sacrificed” for privacy, which could lead to “misallocation of funds, poor capacity for planning… and a competitive disadvantage in economic and workforce development.”
Gunter told Northam that the demonstration provided by the bureau spit out inaccurate and likely impossible data.
“For example… we found the total number of girls ages 15-19 in the City of Emporia were decreased from the actual 185 to only 30,” she wrote. “Applying this number to the teen pregnancy rate for Emporia increased the rate from 10 percent to 66 percent. This is not only ludicrous, but, if consistent across localities and subject areas, deeply damaging to the ability of state and local governments and non-profits to accurately address the needs of Virginians.”
Other errors were similarly egregious. For example, the demonstration showed 716 people living on the Hawaiian island of Molokai when the actual population in 2010 was just 90 people, according to an op-ed by The New York Times‘ Gus Wezerek and University of Minnesota data scientist David Van Riper. The population of small Native American reservations with fewer than 5,000 residents saw their populations decline by an average of 34% in the demonstration. Small Alaskan villages saw population declines even though they continued to grow.
An analysis by Utah officials saw 15,000 actual residents disappear from the count. Two cities lost more than 50% of their populations, 20 cities lost 20% of their populations, and 43 cities lost 10% of their populations, Utah House Speaker Brad Wilson and Senate President Stuart Adams said in a letter to Steven Dillingham, the director of the Census Bureau. Another city, on the other hand, saw a 253% population increase.
“Not only will this alter basic demographic information in both rural and urban areas of the state, but it may also adversely affect longitudinal studies about health, safety, and welfare,” they wrote.
An analysis by Washington state’s Office of Financial Management said the demonstration’s household data for eight counties showed occupancy rates “at or near 100%, which is illogical and historically implausible.”
“There is bias in the demonstration data that causes areas with small populations to get larger while areas with larger populations get smaller,” state demographer Mark Mohrman said in a letter to Dillingham. The data also deviated when it came to racial demographics, he said.
Along with miscounts, these errors could also completely misrepresent entire communities.
“A rural, declining, old, predominant white community, for example, may appear instead growing, younger, and more diverse,” Gunter wrote. As a result, redistricting data “will be inaccurate” and “majority-minority districts could lose their status,” she warned. The data will also result in potential loss of funding for communities, which would affect housing, transportation, emergency management, and numerous other services.
Qian Cai, the director of the Weldon Cooper Center’s Demographics Research Group, warned in an op-ed at the Richmond Times-Dispatch that while the move is “well-intended” the bureau “believes data distortion prevents reconstruction of individual records including age, gender, race and homeownership, even though that basic information already is easily accessible through the internet.
The “consequences are disastrous,” she said.
“We no longer will have accurate information about our communities. The data distortion might misrepresent a city’s population size by 25% or more, or in the case of an age group… by more than 100%,” she wrote. As a result, data necessary for things like enforcing voting rights, funding schools, planning for emergencies, tracking opioid addiction, and city planning will be inaccurate and meaningless.
Officials in Maine also expressed concern after seeing the demonstration data to Dillingham in a letter last month.
“Our analyses show that small, rural places suffer the most in terms of inaccurate estimates. In Maine’s case, that means a majority of our counties and sub-county geographies are subject to unacceptably high levels of error… The repercussions for our state and nation are considerable,” wrote Angela Hallowell, the state’s data center lead, and Maine State Economist Amanda Rector. The proposal, they said, would “throw into doubt any redistricting, funding decisions, or analysis done using census data.”
Maine’s analysis found that the Census Bureau’s demonstration data on certain age and gender groups had error rates of more than 100%, which the letter warned would leave areas “vulnerable to large miscounts.” And while data on white populations was largely accurate, minority populations had error percentage rates of more than 25% and even more when looking only at black populations.
In Maine’s Franklin County, for example, “the count of households with a black… householder was more than 11 times” higher in the demonstration than in the original data.
“This will have myriad financial and economic repercussions for the ‘winners’ and ‘losers’ that municipalities will randomly become,” the letter said.
John Abowd, the associate director for research and methodology and chief scientist of the US Census Bureau, said in a letter to officials in Nevada that the algorithm was “written specifically for the 2020 Census and cannot be directly applied to any other data.”
“The Census Bureau is committed to publishing accurate data for the 2020 Census, however our obligations to protect privacy mean that we cannot publish perfectly accurate data for every conceivable use case,” Abowd wrote. He argued that the bureau expects the “impact of the error introduced by the use of formal privacy will be less than the error resulting from other factors.”
“We know of no other statistical technique that can be reliably employed to assure the confidentiality of the underlying data while simultaneously assuring the highest quality statistical product for our data users,” he wrote.
Abowd said that as the bureau works to improve the algorithm, “we are also researching a variety of contingency plans to ensure that the 2020 Census Data Products meet the Census Bureau’s data quality standards.”
“Because of the impact of differential privacy on data accuracy for small geographies or populations, however, the Census Bureau is evaluating what tables to release and at what geographic levels to ensure that our data products meet fitness-for-use standards,” he added. “More generally, the Census Bureau is eager to engage with federal, state and local programs to learn more of how they use census data and their requirements for accuracy. The Census Bureau is also eager to engage with stakeholders to understand the privacy expectations, requirements, and concerns of the American public.”
But state officials worry that even minor errors could result in significant long-term consequences.
“Inaccuracy in the decennial census will flow through ten full years of data,” Hallowell and Rector warned Dillingham. “The current implementation of [differential privacy] creates a group of regions and people, predominantly rural and already marginalized, that are left behind; they will continue to be left behind for the remainder of the decade unless action is taken to improve the algorithm. Without resolution… it will be impossible to measure the magnitude of these errors, resulting in further challenges for these places and communities.”