On Tuesday we published data access legislation as part of the Digital Economy Bill. The Bill is an important part of what we are seeking to do in GDS to transform our relationship to data and unleash the next decade of innovation and public service reform. But legislation is only one piece of the jigsaw; this post will place the Bill in the context of our wider data programme.
Our clauses in the Digital Economy Bill are described as being about ‘data sharing’, although our preferred term is ‘data access’, because we think it better reflects the way technology and practices for handling data across government are changing. Data sharing often implies a ‘drag and drop’ copy of a dataset, or of particular fields within it, perhaps through regular batch updates, or as a physical transfer via a disk in the mail (couriered if you were in a hurry). Finding and getting access to data has often been time consuming, and sometimes legislatively difficult and contractually expensive. Compounded by poor quality infrastructure, the upshot of these friction points is that data has been used too little for public good.
On the flipside, infrastructure-driven friction introduced implicit safeguards which prevented misuse. This means that as we tackle these sources of friction and get data flowing in ways that benefit government, citizens and businesses, it is important that we do so in ways that don’t inadvertently compromise privacy and security.
On the technology front, the growing use of Application Programming Interfaces (APIs) is one of the ways we do that. For one thing, APIs make it easier to easily query an existing dataset. This removes much of the need for bulk copying, and with that the security, freshness and accuracy problems associated with multiple copies of the same data being held on multiple systems and being used by different services. Just as importantly, especially when it comes to protecting privacy, it means services and the government departments that run them can give access to ‘just enough’ personal data to perform the task at hand or, in the case of simple eligibility assessments, no underlying data at all. For example, if having a valid driving licence is an eligibility criteria for a particular service then a binary yes or no response, rather than a person’s full driving licence record, will suffice. Good API design also provides us with the option of building auditing capability into our systems, by default. That means we know who in government is checking what data and when they are doing so.
To date our work on registers - an ecosystem of interconnected data stores of trustworthy data that are easily accessible - has focused on lists of open data, e.g. lists of ‘things’ like countries. But there are important questions about how government manages the collection, storage and access to personal data. An exploration of what we are calling ‘private registers’ is how we are grappling with these questions. That work is currently in discovery and we’ll be looking to publish our early findings and Alpha plans later in the year.
Separately, government's commitment to enabling a digital state that has privacy at its heart can be seen in the design of GOV.UK Verify. This platform is a new way to safely and straightforwardly prove who you are online when accessing services like filing your tax return, viewing your driving licence or applying for Universal Credit. Besides being quick and simple to use it enhances privacy because information is not stored centrally, and there’s no unnecessary sharing of information. The company you choose to verify your identity doesn’t know which service you’re trying to access, and the government department doesn’t know which company you choose.
The ability to query data through APIs does not, however, deal with every use case in government. For instance, there is considerable use in being able to link de-identified data from individuals across multiple datasets, for example to link longitudinal survey data with employment or income records to understand long-term social mobility across different generations. In doing this, researchers are not interested in each specific person, but rather the aggregate patterns revealed. As the use of machine learning in government expands there will be a growing need to access information like this in an appropriate and secure way in order to train algorithms.
Government is also in a moment of technological transformation; our legislation needs to be able to deal with the current technical reality as well as the new environment that we are building - even if for many (but not all) use cases the traditional notion of bulk ‘data sharing’ is on its way out. Take the example highlighted in the data access legislation of troubled families, which requires cooperation in an area from a wide range of local public services around a specific set of individuals. For some of these services, improvements in their underlying technology infrastructure will not be imminent.
That is why alongside the broader changes enabled by the legislation we will be working with departments in the normal way by assessing user needs and building and iterating products. Ultimately the accompanying technical standards will be one of the processes civil servants will need to have regard to when they use the legislation, alongside others specified in the annually updated codes of practice that will accompany the Bill.
Whatever the technical methods employed, legal permission is a fundamental precondition to the better use of data by government. There are good reasons why we don’t have, or desire, a legal free-for-all for how personal data moves around government. And it is also true that legislation is not always a barrier to effective work with data or indeed the biggest barrier we face. But in the specific places where it matters, it matters a great deal.
These are issues that we will return to throughout life of the data programme and as ever we will be blogging in the open about our work as it develops.