Data Services Analytics unit
© Getty Images / WIRED
A data analytics team close to the heart of government has collected data on more than 650 million people, including children under the age of 13, according to newly unearthed documents.

The Data Services & Analytics unit is described as "one of the most advanced data analytics centres in government" and forms part of the Home Office's Digital, Data and Technology (DDaT) department. It builds decision-making tools and provides data-driven insights to the rest of the Home Office - although details of exactly what it does remain tightly guarded.

The huge amount of data being analysed and the Home Office's lack of transparency has prompted accusations from privacy campaigners that the unit could be creating a "super database" that risks exacerbating racial biases among law enforcement agencies.

On top of transparency concerns, two of the unit's projects are currently being reviewed by the Biometrics and Forensics Ethics Group, a government advisory body investigating "ethical issues in the use of complex datasets". When asked what these projects were and on what basis they were being looked at, a Home Office spokesperson declined to comment.

Freedom of Information requests sent by charity Privacy International and shared with WIRED reveal the data unit has information about people's ethnicity, immigration status, nationality, criminal record history, and biometrics. The data could be used to build up a detailed picture of the millions of people who are included in the databases.

But little is known about where the data comes from. While a government procurement notice published in January 2020 says the unit has access to commercial databases, data from immigration and border systems, and data from police and intelligence agencies as sources of information, almost all of the specifics were redacted in the Data Privacy Impact Assessment documents made available by the Home Office.

In total, more than 30 data providers are listed in the documents. Only two of these, fraud prevention company GB Group and data analytics firm, Dun & Bradstreet, were not redacted. GB Group acknowledged it provided data to the unit but declined to provide any further details citing "confidentiality obligations". Dun & Bradstreet says it is against its policy to comment on its work with clients.

"The potential scope of this secret mass data gathering is truly frightening," says Edin Omanovic, the advocacy director of Privacy International. "Unfortunately, this is the kind of thing you would expect from an intelligence agency, not a little-known department in the Home Office."

The Home Office stressed that all data is held securely and processed in line with relevant human rights and privacy legislation - including data protection laws and the Human Rights Act 1998. "As expected, the Home Office holds a large amount of data to carry out essential operations and deliver on the people's priorities," a Home Office spokesperson says. The government department oversees work on everything from policing and immigration and border control to alcohol strategy and the threat of terrorism.

While the Home Office declined to provide further information on the unit's activity, a recent industry event indicates the unit is involved in at least two Home Office projects, the warnings index and status checking project.

The warnings index is the UK's immigration watchlist database. It provides members of law enforcement agencies, such as Border Force, with the names of individuals "with previous immigration history, those of interest to detection staff, police or matters of national security", according to a report published by the Independent Chief Inspector of Borders and Immigration. The system was developed in 1995 and has been regularly criticised in the past decade. It has been described as "unfit for purpose" and in 2019 a whistleblower told the Guardian that employees lacking the relevant security clearance had been accessing the system.

The status checking project seeks to document and share live immigration status information across government and law enforcement agencies. It can be used to provide "proof of entitlement to a range of public and private services, such as work, rented accommodation, healthcare and benefits," according to a government report. Liberty, the human rights advocacy group, sounded the alarm over the project in 2019, saying the secrecy surrounding it was "deeply sinister".

"The fact that [the Home Office] is now trying to build what is effectively a massive migrant database to make it easier to deny people access to essential goods and services shows that it has learned absolutely none of the lessons of the Windrush Scandal," Gracie Bradley, Liberty's policy and campaigns manager, told the Guardian at the time.

Last year, the independent advocacy organisation Foxglove and the Joint Council for the Welfare of Immigrants (JCWI) mounted a legal challenge in response to the Home Office's use of a visa streaming algorithm which they claim "entrenched racism and bias into the visa system". The algorithm was shown to automatically give individuals from certain countries a 'red' traffic-light score, making it more likely their visa application would be denied. The Home Office ditched the algorithm ahead of the legal challenge reaching court and said it was "redesigning" its processes.

Although it remains unclear whether the Data Services & Analytics team were involved in the visa streaming algorithm, Chai Patel, the legal policy director of JCWI, claims discriminatory data processing is widespread within the Home Office.

"The datasets the Home Office uses are tainted by decades of institutional racial bias, and this data therefore poses grave risks to both British and migrant ethnic minorities," Patel says. "We need root and branch reform of the Home Office and complete transparency over how they use the personal information entrusted to them."

According to a recent job listing, which has since been removed, the data analytics unit is capable of receiving real-time streams of data and is overseeing the world's largest public sector deployment of IBM's data matching software, IBM Big Match. The software can be used to group records that represent the same person and run probabilistic searches across multiple datasets.

By performing data matching on a large scale, there is a risk that the Data Services & Analytics unit may encourage discriminatory practices and policies, Omanovic warns. "We've seen a whole industry develop which aims to 'predict' things like crime based on gathering huge amounts of data. The idea, however, that more data leads to more accurate conclusions is fundamentally flawed. In reality, what we've seen is that if junk goes in, then junk comes out," he says.

A significant barrier to much of the work done by teams such as the Data Services & Analytics is transforming data from multiple sources to be useful. In an apparent bid to overcome these difficulties, the Home Office awarded almost ยฃ20 million in contracts related to the unit in 2020. These included contracts for cloud migration, cloud operations, and data matching services.

Greater efficiency in sharing and analysing data between government departments may undermine important oversight procedures, warns Michael Veale, a lecturer in digital rights and regulation at University College London. "A real concern is that technical efforts to make matching smooth and easy across all parts of the public sector is never replaced by enacting proper, procedural oversight," he says.

While internal oversight of the unit's activity may exist, public oversight and scrutiny remains hampered by a lack of transparency. "Rather than have a meaningful public debate, telling the public what data it will use, show why it is necessary and proportionate, and tell us what safeguards exist, the Home Office has up until now decided to proceed without telling anyone," Omanovic says. "The Home Office now must come clean and reveal the true extent of this secret mass data exploitation programme."