It’s a common internet experience: throw a foreign phrase into Google Translate or any other online translation tool and out comes a farcical approximation of the real thing.
That’s why many experts — even Google itself — caution against relying on the popular Google Translate for complex tasks. Google advises users that its machine translation service is not “intended to replace human translators.”
Yet the U.S. government has decided that Google Translate and other machine translation tools are appropriate for one task: helping to decide whether refugees should be allowed into the United States.
An internal manual produced by U.S. Citizenship and Immigration Services, the federal agency charged with admitting immigrants, instructs officers who sift through non-English social media posts of refugees that “the most efficient approach to translate foreign language contents is to utilize one of the many free online language translation services provided by Google, Yahoo, Bing, and other search engines.” The manual includes step-by-step instructions for Google Translate.
The manual was obtained by the International Refugee Assistance Project through a public records request and shared with ProPublica.
Language experts said the government’s reliance on automatic translation to dig into refugee social media posts was troubling and likely to be error-filled since the services are not designed to parse nuance or recognize slang. The government may misconstrue harmless comments or miss an actually threatening one.
“It’s naive on the part of government officials to do that,” said Douglas Hofstadter, a professor of cognitive science and comparative literature at Indiana University at Bloomington, who has studied language and analogies. “I find it deeply disheartening and stupid and shortsighted, personally.”
Asked about the agency’s use of machine translation tools, USCIS spokeswoman Jessica Collins said in an emailed statement that review of publicly available social media information “is a common sense measure to strengthen our vetting procedures.”
USCIS has stated that “information collected from social media, by itself, will not be a basis to deny refugee resettlement.”
In 2017, Facebook apologized after its machine-translation service translated a post by a Palestinian man that said “good morning” as “hurt them” in English or “attack them” in Hebrew.
As a test, ProPublica asked language professors to copy and paste tweets written in casual language into Google Translate and compare the results with how they would interpret the tweets.
One recent Urdu-language post on Twitter included a sentence that Mustafa Menai, who teaches Urdu at the University of Pennsylvania, translated as “I have been spanked a lot and have also gathered a lot of love (from my parents).”
Google translated the sentence as “The beating is too big and the love is too windy.”
The Trump administration has vastly expanded the role of social media in deciding whether people can move or travel to the United States. Refugee advocates say the government’s reliance on machine-translation tools raises further concerns about how immigration officers make important decisions affecting applicants’ lives and U.S. national security.
USCIS has itself found that automated translation falls short in understanding social media posts. An undated draft internal review of a USCIS pilot social media vetting program concluded that “automatic foreign language translation was not sufficient.”
A separate pilot review conducted in June 2016 stated that “native Arabic language and subject matter expertise in regional culture, religion, and terrorism was needed to fully vet” two cases in which potentially derogatory social media information was found. The documents were published by the Daily Beast in January 2018.
The manual, much of which is redacted, only addresses procedures for a narrow subset of refugees: people whose spouses or parents have already been granted refugee status in the U.S., or so-called follow-to-join cases. In 2017, 1,679 follow-to-join refugees were admitted to the U.S., about 3% of total refugee admissions, according to government data.
“It defies logic that we would use unreliable tools to decide whether refugees can reunite with their families,” said Betsy Fisher, strategy director at IRAP. “We wouldn’t use Google Translate for our homework, but we are using it to keep refugee families separated.”
In a federal lawsuit in Washington state that is now in the discovery phase, IRAP is challenging the Trump administration’s suspension of the follow-to-join refugee program.
It is unclear how widely the manual’s procedures are used throughout USCIS, or if its procedures are identical to those used for vetting all refugees or other types of immigrants.
The manual is undated, but it was released to IRAP in response to a request for records created on or after Oct. 23, 2017.
USCIS did not respond to questions on whether the manual’s procedures are used to vet other refugees, when it was put into use or if it is still in use.
“The mission of USCIS first and foremost is to safeguard our homeland and the people in it,” Collins said. “Our first line of defense in these efforts is thorough, systematic vetting.”
In the 2018 fiscal year, USCIS conducted 11,740 social media screenings, according to an agency presentation.
The USCIS manual acknowledges that “occasionally,” online translation services may not be adequate for understanding “foreign text written in a dialect or colloquial usage,” but it leaves it up to individual officers to decide whether to request expert translation services.
Without foreign language fluency, an officer is unlikely to know whether a post needs additional review, said Rachel Levinson-Waldman, senior counsel at the Brennan Center for Justice.
Google and Verizon, which owns Yahoo, did not respond to questions about the use of their services when vetting refugees. Emily Chounlamany, a Microsoft spokeswoman, said “the company has nothing to share on the matter.”
Language experts say satire is another problematic area. A recent satirical Persian-language tweet showed a picture of Iranian elites raising their hands, with text stating, “Whose child lives in America?” (The tweet is commentary on a recent controversy in Iran regarding high-ranking officials’ close relatives living in the West.) The text was translated by Google as “When will you taste America?” Microsoft’s result was: “Who is the American?”
“The thing about Persian and the Iranian culture is that people love to make jokes about anything,” said Sheida Dayani, who teaches Persian at Harvard University and instructs her students to avoid using Google Translate or similar tools for their assignments. “How are you going to translate it via Google Translate?”
Automated translation services are the “absolute wrong technology” for immigration officers making important decisions, Dayani said.
The use of translation tools has come up in other contexts. After a highway patrol trooper in Kansas conducted a warrantless search of a Mexican man’s car in 2017 by asking the man for consent to do so in Spanish via Google Translate, a U.S. district judge threw out the search evidence, finding that the defendant did not fully understand the officer’s commands and questions.
Google has touted improvements in its translation tool in recent years, most notably its use of “neural machine translation,” which it has gradually rolled out for more languages. Researchers in the Netherlands have found that while the neural machine translation method improves quality, it still struggles to accurately translate idioms.
One major problem with machine translation is that such tools do not understand text in the same way that a person would, Hofstadter said. Rather, they are engaged in “decoding” or “text substitution,” he said.
“When it involves anything that is subtle, you can never rely on it because you can never know if it’s going to make grotesque errors,” Hofstadter said.
Machine-translation services are typically trained by using texts that have already been translated, which tend to use more formal speech, for instance official United Nations documents, said David Guy Brizan, a professor at the University of San Francisco who researches natural language processing and machine learning.
Language iterates too quickly, especially among young people, for even sophisticated machine-translation services to keep up, Brizan said. He pointed to examples of English-language phrases currently popular on social media such as “low-key” or “being canceled” as ones that automated services could struggle to convey.
He added that nontextual context like videos and pictures, the parties involved in a conversation and their relationship, and cultural references would be completely lost on machine translation.
“It requires a cultural literacy across languages, across generations, that is sort of impossible to keep up with,” he said. “You can think of these translation programs acting as your parents or grandparents.”
Rachel Thomas, director of the Center for Applied Data Ethics at the University of San Francisco, said that while machine-translation capabilities are improving, anyone depending on algorithms or computers should think carefully about the recourse for people wronged by those systems’ mistakes.
Refugees rejected for admission can request a decision review, but advocates say they are typically given little detail as to why they were rejected.
Efforts to scrutinize social media posts of some people trying to enter the United States began under the Obama administration, and they were encouraged by Democrats and Republicans in Congress. USCIS launched a social media division within its Fraud Detection and National Security division in July 2016, building on pilot programs operating since 2015.
The Trump administration has dramatically increased social media collection as part of a push for “extreme vetting” of people entering the country. In May, the State Department updated its visa forms to request social media identifiers from most U.S. visa applicants worldwide.
In September, the Department of Homeland Security published a notice stating it intended to request social media information from a broad swath of applicants, including people seeking U.S. citizenship or permanent residence, refugees and asylees.
Jeff Kao contributed to this story.