https://data.blog.gov.uk/2016/01/29/starting-the-public-debate-on-data-science-ethics/

Starting the public debate on data science ethics

How should machines make decisions about the different types of support people need to get back to work? Should government look at social media to understand what the public need from the justice system so they can design better mediation services? And how can public services predict demand by looking at searches for keywords on Google or GOV.UK?

These are all questions that 30 people spent a day considering in Sheffield on Saturday.  They were part of a public dialogue that the Government Data Science Partnership is running with Sciencewise and Ipsos MORI about government use of data science. Data science can be defined as powerful computer techniques that can make sense of huge amounts of new forms of data to improve statistics, policy making and public services.

It provides huge opportunities, but also ethical challenges that we have not had to consider before. Because of this, we are creating an ethical framework which will give confidence to policy makers, data scientists and operational staff to innovate with data. Understanding what the public think about how government uses data science is a crucial part of how we develop this framework.

So what exactly is data science?

Data science is an incredibly difficult and complex topic to explain. I’ve been working in this area for two years and still find it hard to explain in simple terms. It’s different from more traditional statistics for three main reasons:

  1. rather than creating statistics to answer a question, it finds alternative existing data from which we can infer an answer (eg searches for ‘flu symptoms’ in Google might indicate the state of the nation’s health)
  2. rather than a human recording the data, newer forms of data can only be collected by a computer (eg sensors on car park spaces to indicate when they are full)
  3. rather than a human looking for patterns in data, machines can learn the best way to solve a problem by going through millions of rows of data (eg Facebook’s facial recognition system learns how to recognise people’s faces from the way we tag our friends in photos)

The difficulty in explaining this term became clear at the end of our pilot session. We realised that we needed to spend at least half a day getting people to understand what data science is, the data and techniques involved, the opportunities it brings and issues we need to consider.

Data, data everywhIMG_0143ere

Part of the introductory half day was exploring what data people generate about themselves. It was interesting to see how much data people did generate – without always being aware that they were doing so and what that data was being used for. For example, census data, surveys, applying to public services, signing up for commercial services, Google searches, smartphones, wearables, internet of things etc.

The afternoon session was spent looking at case studies of potential government data science projects. What really struck me was despite the complexity of the subject, people were able to engage in a serious and pragmatic debate about the right thing to do.

They were thinking about both the public benefit of what the project could achieve (eg producing better population statistics to improve local service planning, understanding user needs to create a better justice system, prioritising food safety inspections to improve public health) and the ethical issues (eg accuracy and representativeness of the data, privacy, consent) and balancing them against each other. These are all questions that our first iteration of the ethics framework starts to address, and these workshops will really help to shape this.

What next?

Public dialogues love complex issues. The participants from each workshop will meet again at another workshop in a few weeks. This will give people time to properly get to grips with data science so they can make informed comments.

Over the next month, we’ll be going to Taunton, London and Wolverhampton to explore these issues with nearly 100 people. This is my take as a policy maker thinking carefully about ethics. Watch this space and you'll also hear from a policy maker in a department, an academic thinking about the theory of engaging the public in data debates and the moderators running the workshops.

4 comments

  1. Martin Waudby

    I realise you're providing simple examples foo the benefit of the reader but it'd be great to see a blog with more detail on the potential and pitfalls and how data is or can be exploited. Part of the data revolution is we involved are tasked with instigating a cultural change about how we look at data and value data about ourselves. For instance your three examples have far more depth to them;

    1 - Predicting flu outbreak from searches has been a victim of its own success and one might argue is a victim of "big data hubris". - http://www.theguardian.com/technology/2014/mar/27/google-flu-trends-predicting-flu

    2 – A far more interesting example is sensors in cars to detect potholes. There's the massive ethical issue of tracking car movement, but the benefit is a near real time view of the volume and velocity of traffic in an area over a long period of time. That's a huge benefit for planners

    3 - The Facebook development of DeepFace is fascinating. This is partly because of the massive database of images already linked to an identity of individuals or groups in various poses, partially obscured images on mass, and it doesn’t really matter if a proportion of the links are wrong. They basically had everything that they needed other than the baseline control group of images, and there’s plenty of ways to achieve that. Imagine the cost if you had to do that from scratch. The other interesting angle is due to history of each identity has they are probably able to track and match across age

    Link to this comment
  2. Steven Tiell

    Madeleine, it's great to see your interest on this issue. I recently wrote about this in the Accenture Technology Vision 2016 -- the "Digital Trust" trend -- and have a much larger body of work under way for the past year(ish). Would be great to see if there's ways we can collaborate. You can find me at http://j.mp/1ULhECt or http://j.mp/20r4yB4.

    Link to this comment