Whether you are dealing with bank transaction data, climate science data or data on hospital performance, the IT infrastructure for managing massive amounts of data is not that different. Tracking and storing data present the same technical challenges for data.gov.uk as they do for a bank’s IT department or researchers using the weather supercomputer at the Met Office.
To pick two of these challenges: today’s data are dynamic. Tracking these modifications and updates requires very different kinds of systems and processes to a traditional static database. Today’s data are also complex, far from the simple spreadsheets of values that most of us have in mind. Most data can only be understood with significant metadata, giving meaning and context to the sets of numbers.
The Royal Society’s Science as an open enterprise report, published last week, responds to these challenges on behalf of the science community. How should researchers manage the volumes of data they produce? And what can be done for others to make the most of them too?
These are tricky questions to answer. Supply and demand for scientific data are diverse and on very different scales in different areas of research. There is deep variety in data sharing projects and the types of institutions built to support them. The intricacies of research data mean that often the value in sharing it only comes from an honest negotiation between potential recipient and data provider. And that means the best solutions are often bespoke. A single set of standards for all data would never serve these fragmented purposes.
The group that put the study together, including me, came up with a solution that focused on standards for communicating effectively rather than standards for data formats and release schedules. Our criteria are for intelligent openness; data must be accessible, intelligible, assessable, and reusable. These cannot usually be achieved by large scale disclosure of information. Considerable work goes in to preparing data for scrutiny and reuse - doing this properly costs between 1 and 10% of research funding.
On behalf of the Royal Society, I welcome the Research Sector Transparency Board proposed in today’s White Paper. This new group will support researchers to change the way they work, moving the default position from closed to open data. But, crucially, it is also a forum for the community to develop a useful kind of transparency and an intelligently open research sector. Professor Geoffrey Boulton, chair of the Royal Society's report, has commented: "Openness of itself has little value unless our criteria of intelligent openness are observed, which often requires a non-trivial effort by those releasing data. Choosing which data to treat in this way and understanding the needs of the audience for which it is intended are crucial. I look forward to seeing how the new Board will approach these problems.”
The hope is that public sector data released through data.gov.uk will be picked up and interpreted by small technology companies looking for new opportunities. The lesson of the Royal Society report is that this interpretation is complex and various, and that there are not enough people that know how to do it. The technical challenges facing data management everywhere are not solved yet, and will not be for some time.
For the UK to make sensible attempts at data-driven enterprises there needs to be a new cohort of data scientists and clearer career paths for experts in informatics. We are a world-leading scientific nation. Our companies and universities make the most of research from around the world. We need to make sure that we can continue to do that in a future where data is the new raw material.
Professor Brian Collins CB, FREng, FBCS, CITP, FIET, C Eng, FIOP, FICE, FRSA, RCDS, MA, DPhil took up the role of Professor of Engineering Policy at UCL on the 1st August 2011. Prior to that he was the Department for Transport’s (DfT) Chief Scientific Adviser (CSA) from October 2006 and CSA for the Department for Business innovation and Skills (BIS) from March 2009 after being CSA in Department for Business, Enterprise and Regulatory Reform (BERR) from May 2008, during which time Energy policy was in his remit. He left both positions at the end of May 2011. He was Professor of Information Systems at Cranfield University from August 2003 until July 2011.
He chaired until the end of March 2012 the Engineering and Interdependency Expert Group for Infrastructure UK, led by Lord James Sassoon, Commercial Secretary in Her Majesty’s Treasury. He is a member of the Royal Society working group on ‘Science as an Open Enterprise’, with a final report due to be published in June 2012. He was bestowed by Her Majesty the Queen the Honour of Companion of the Bath (CB) in the 2011 New Years Honours list. He was elected a Fellow of the Royal Academy of Engineering in 2009. He was Vice President of the BCS responsible for External Affairs and was Chairman of the BCS Information Security Strategic Forum. He has served as Vice President of the then IEE and was Chairman of the Informatics Division. He is by presidential invitation a Fellow of the Institute of Civil Engineers. He is a Fellow of the Institute of Physics.
He is an Emeritus Visiting Professor at City University London, has a visiting Professorship at Wollongong University, New South Wales, Australia and holds an Honorary Doctorate from Kingston University. He holds a MA in Physics from Oxford University and a D.Phil in Astrophysics from the same University.