Skip to main content

https://data.blog.gov.uk/2016/07/21/how-we-do-a-data-science-project/

How we do a data science project

Posted by: , Posted on: - Categories: Data Analysis, Data Science

Over the past few years, the emergence of data science has allowed organisations across the world to become more effective and efficient, making data-driven, evidence-based decisions, exploiting and realising the true value of the data they collect. The Data Projects team at GDS looks to apply this same practice to policy, delivery and operational issues across government and the public sector.

We are often asked about how we select projects, how we are commissioned, and how we ensure best ethical practices are followed, so we wrote this blog post to share how we work.

Data science covers a broad range of specialist skills from maths and statistics, to information science, to computer science.

Venn

http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram

With such a range of techniques, approaches and potential use cases, the team looks to pursue areas with high potential that can credibly result in impactful outcomes. We’ve developed the following approach to help us select and prioritise work.

Stage One - identifying the problem

When embarking on a project, a clear objective or ‘problem statement’ is needed - this might seem obvious, but is often one of the most challenging parts of the process. This statement could come from senior colleagues, arise from persistent working level problems, or begin as an innovative solution that has been proven successful tackling a common problem from outside of government.

Distilling the objective into a clear statement means bringing together policy leads and experts from relevant department(s). This helps us grapple with who the users are, what their needs might be, any additional issues to consider, what data is available, and what timescales and resources are required.  

We point to previously completed prototypes to illustrate what is possible, but we are always guided by our stakeholders in exploring topics, and avoid pre-judging an issue or making immediate conclusions about the right way to fix a perceived problem.

At this point we might agree that there is a potential project for us to work together on, alternatively we might suggest another approach. Data science is not always the most appropriate solution, if that’s the case we offer advice on next steps.

Stage Two - Follow-up Scoping

After our initial meetings, we start on two strands of work - one within the team, and one involving our stakeholders.

As a team we assess the project on its:

  • Value to Government as a whole,
  • Value to the sponsor,
  • Value to the Data Projects team,
  • Value to the Government Data Science Partnership (GDS, ONS & GO Science),
  • Inherent project value (e.g. what new tools/techniques could be used).

In line with the ethical guidelines around data science, the project must  have a clear user need and public benefit.

We then engage further with prospective users. We sometimes deploy a user researcher at this stage to build personas, conduct interviews, and run focus groups to more closely understand the user group and help us address their needs.

At the end of Stage Two we write up an outline of the project, detailing what datasets are required, the profiles of potential users, and a forward looking schedule of deliverables (organised into sprints). This document is shared and acts as an agreement between the team and the sponsor going forward.

Stage Three - Discovery Data Dive  

We then obtain and look at the relevant data. Often, we do a preliminary piece of work to communicate what the data can tell us: this could mean aggregation, linking, and/or visualisation. An example of this is a visualisation developed to help understand NEET (Not in Education Employment or Training) status among people aged 16-24. This kind of work helps people who aren’t used to looking at large data tables and spreadsheets to engage.

It’s unlikely data projects will use data granular enough to identify individuals, however we adhere to the ethical framework by using data and tools which cause minimal intrusion. This could mean aggregating data, querying against datasets, or using synthetic data in place of live data.

Stage Four - Feedback  

We present our initial analysis and user research back to stakeholders to ensure that the project is telling them something useful. This will inform the next phase of work, and help us further plan our timescales and resourcing towards the agreed outcome.

Stage Five - Project

The project happens! We continue to work with policy officials and analysts within departments, defining and refining objectives to make sure the crucial elements are delivered as priority. We continue to use the ethical framework throughout this stage to ensure project work balances new approaches and techniques with respect for privacy.

The project could be a number of things, including but not limited to:

  • a tool to help policymakers tackle a major issue,
  • a piece of data analysis to enable a new way of working for a team or department, or
  • a new system to be built into the way a team operates.

The project team will include a range of different skillsets to suit the challenge:

Problem

Stage Six - Completion/Handover

The final stage is handover. If the project is a tool, this could entail some training to ensure the in-house team can use and develop it further. If the project is a visualisation or a data model, this could be a transfer of code with explanatory notes and documentation. If the project is purely analysis to answer closely defined questions, this could be a formal presentation of the outcomes with an explanation of practices and data used. The data project team will maintain contact with the sponsoring department to ensure that the project continues to deliver value over time.

As a team we continue to look at how others, both inside, and outside of government are using data science. We monitor successful projects internationally, and are always keen to hear about new ideas and applications of data science, so please do get in touch with us!

Sharing and comments

Share this page

2 comments

  1. Comment by Sam Smith posted on

    Are these projects published in a register anywhere?