If Tesco knows day-to-day how poorly the nation is, how can Government access similar insights so it can better plan health services? If Airbnb can give you a tailored service depending on your tastes, how can Government provide people with the right support to help them back into work in a way that is right for them? If companies are routinely using social media data to get feedback from their customers to improve their services, how can Government also use publicly available data to do the same?
Data science allows us to use new types of data and powerful tools to analyse this more quickly and more objectively than any human could. It can put us in the vanguard of policymaking – revealing new insights that leads to better and more tailored interventions. And it can help reduce costs, freeing up resource to spend on more serious cases.
But some of these data uses and machine-learning techniques are new and still relatively untested in Government. Of course, we operate within legal frameworks such as the Data Protection Act and Intellectual Property law. These are flexible but don’t always talk explicitly about the new challenges data science throws up. For example, how are you to explain the decision making process of a deep learning black box algorithm? And if you were able to, how would you do so in plain English and not a row of 0s and 1s?
We want data scientists to feel confident to innovate with data, alongside the policy makers and operational staff who make daily decisions on the data that the analysts provide –. That’s why we are creating an ethical framework which brings together the relevant parts of the law and ethical considerations into a simple document that helps Government officials decide what it can do and what it should do. We have a moral responsibility to maximise the use of data - which is never more apparent than after incidents of abuse or crime are left undetected - as well as to pay heed to the potential risks of these new tools. The guidelines are draft and not formal government policy, but we want to share them more widely in order to help iterate and improve them further.
We’ve taken a user-centred and open approach. Over the last few months we’ve worked with data scientists and policymakers to understand how they approach a new project and developed a set of six principles that follow their project journey. And we’ve had advice from a really wide range of external experts from academia, the private sector and civil society groups. There was much discussion about it at today’s workshop on data science ethics at the newly founded Alan Turing Institute where I spoke. These events are vital to develop the framework further; to keep getting feedback and to keep at the cutting edge of evolving technology that might pose more challenges or – hopefully - ways of overcoming ethical issues.
So what’s in the framework? There is more detail in the fuller document, but it is based around six key principles:
- Start with a clear user need and public benefit: this will help you justify the level of data sensitivity and method you use
- Use the minimum level of data necessary to fulfill the public benefit: there are many techniques for doing so, such as de-identification, aggregation or querying against data
- Build robust data science models: the model is only as good as the data it contains and while machines are less biased than humans they can get it wrong. It’s critical to be clear about the confidence of the model and think through unintended consequences and biases contained within the data
- Be alert to public perceptions: put simply, what would a normal person on the street think about the project?
- Be as open and accountable as possible: Transparency is the antiseptic for unethical behavior. Aim to be as open as possible (with explanations in plain English), although in certain public protection cases the ability to be transparent will be constrained.
- Keep data safe and secure: this is not restricted to data science projects but we know that the public are most concerned about losing control of their data.
Principle 4 is about understanding what the public think. Changes in technology will have made some people more relaxed about putting their data online, and some people more concerned. We want to understand what different groups of people think about how Government should be making use of data to make better policy and improve public service efficiency. We’re just starting a public dialogue on the ethics of data science involving workshops where people can spend two days understanding and debating the issues, a survey to get reflections of a wider group and an interactive tool which people can use to find out more about data science. We’ll want to share that insight and ask you what it means for our framework so watch this space.