Today marks the launch of http://statistics.data.gov.uk as the landing page for access statistical geographies in a linked data format. This linked data site is the culmination of three years' work that started with a request in 2010 by the UK Location Programme to pilot the use of linked data to meet UK's obligations under INSPIRE, an EU Directive to harmonise how spatial datasets are supplied across Europe.
The pilot ran for eighteen months and provided proof of concept that ONS could develop a set of unique resource identifiers (URIs) to break open and publish the millions of data cells from its geographic products, and link these attributes together. Access to this machine-readable data was delivered directly or via a slightly clunky API. The pilot also successfully achieved its aim to develop ONS's in-house knowledge of linked data rather than simply contracting the project out to an external supplier. These newly acquired in-house skills allowed ONS to develop the next stage of the project.
Having successfully tested the feasibility of creating and using linked data for statistical geographies, the next phase of the project was to deliver a full system that contained all the ONS core geographic data and could be used by both technical and non-technical users to navigate the data.
Traditionally, ONS geographic data has been published through a series of products that each have their own purpose. The Code History Database allows a user to identify when a geographic boundary was created and which boundary it replaced. The National Statistics Postcode lookup identifies the relationship between postcodes to a number of geographies, Lookups from the core statistical geography of output Area to other output geographies. Digital boundary files allow users to perform spatial analysis on data and to visualise statistics. The problem with having all these data sources as separate products and formats is that the data for a single geographic instance - for example Westminster parliamentary constituency of Fareham - is spread across a number of different products that often require different software to access them. The postcodes in Fareham are on one product, its digitised boundary on another, and so on, making it difficult to get an overall picture of the geography for which the statistics are provided.
Taking a product-based approach to geographic data also makes it difficult for machines and systems to consume them , and ONS will often need to produce a number of different versions of the same product, to tailor them to the requirements of the different systems using them.
For this reason and based on the results of the pilot, ONS decided to invest in delivering all of its geographic data in a linked data format. All the products are referenced using a single set of codes - the nine character GSS codes for statistical geographies. Each instance of each different geography can be uniquely referenced, allowing codes to link their attributes together, and consolidate them into a single resource, using the GSS code as the unique identifier.
By HTTP encoding that GSS code, a user can type in a URI for any geography and return all of the consolidated information on that geography that ONS holds for it in linked data format.
This is what ONS has done at http://statistics.data.gov.uk and this is allowing developers to come in and access the data in ways that were not possible before. Already, ONS is working on integrating the geography framework into statistical applications for the ONS Regional and Local Analysis team, the Department for Communities and Local Government, the Department for Energy and Climate Change and Scottish Government. They are able to access the geographic data they require through the linked data query language SPARQL, and to link this data into their statistics.
http://statistics.data.gov.uk As well as this, ONS has provided a non-technical access point to the data through the ‘Explore’ and ‘Locate’ tabs. This human interface allows users to search for geographies based using a variety of criteria, such as the GSS code, its name, or geography type, or to draw a bounding box on a map to get all the geographies contained in it . This gives them direct access to the lowest possible level of the data rather than having to go through the geographic products. This is just the start of a gradual process of moving to linked data. Work is already underway to publish boundaries in TopoJSON format that will support ONS data visualisation, postcode data as RDF format and the huge range of ONS statistics as RDF that would then link them to their geographies. We welcome suggestions on what data users would like to see available and what data users would like to see linked to the existing geographies.