My top ten datasets - a guest post by Simon Rogers has become one of the finest national open data initiatives in the world - it now has more data than the mighty in the US, with 4,223 datasets, compared to 2,876 over the Atlantic. It's not perfect - far too many links take you to front pages on other sites, rather than the data itself. It could also do with more help for the less-experienced user, witness the multitude of downloads on the Treasury's Combined Online Information System (COINS) dataset (

But nevertheless, what a resource. And where it really comes into its own is in the publication of immense datasets previously kept within the confines of the civil service, many of which show highly local data. So, if I had to pick my top ten datasets here is where I would start:

1) National Public Transport Data Repository (NPTDR)

If you want a complete dataset, look no further. The NPTDR is a snapshot in a week of British public transport. So for the week of 5-11 October, 2009, every train, coach and bus journey is catalogued and recorded in a huge set of spreadsheets, divided up by every region in the country (excluding Northern Ireland, that is). There is absolutely no help with the data, mind you, but if you have the skills apparently you can work out routes and timetables. You also need the National Public Transport Gazetteer ( to get the most out of it and the National Public Transport Access Nodes ( Here's what we did with the data:

2) Combined Online Information System

As an example of how not to release data, Coins is probably unparalleled: complicated, confusing and massive it is the ultimate repository of public spending information. The government uses it to produce economic reports and previous governments had always refused to publish it. To its credit, the coalition has done. Civil servants always warned it would be complex and so it proved, between the fact tables and the adjustment tables and the BitTorrent files and the zipped CSVs. But putting the data out there also enabled us to set up a filter on it which you can see here:

3) Youth cohort study

The government is good at measuring things (lousy at making them easy to use, mind)  and the Youth cohort study is one of those datasets which give you a real snapshot of how people actually live. It basically brings together detailed breakdowns on family environment, attitudes, education, employment. If you want to know how young people really feel, this is a good place to start. Just needs a proper regional breakdown to make it invaluable.

4) England in dog mess

Really. This dataset covers everytime someone is fined for letting their dog foul - or for not having control of their animal. It has a full regional breakdown and tells you useful things like how many times a penalty notice was issued - or even the levels of fines charged. Liverpool is obviously either very hot on this, or has a big problem - it charged over £15,000 in 2008/09. Perhaps more seriously, all fixed penalty data is on and there are a surprising number of offences where fixed penalties are issued. With litter, for instance, over £1m of fixed penalty notices were issued in 2008/09 (

5) National Insurance Numbers allocated to Adult Overseas Nationals

Immigration is always in the news - but there is little in the way of actual facts. These figures provide key data for the UK and real
facts: the numbers of people actually applying for NI numbers so they can work. There's a breakdown by nationality (which shows Poland at the top, being replaced by India, which is traditionally number one) plus a list of applications by local authority and parliamentary constituency.

6) Regional Labour Market Statistics

Unemployment is always a big issue and - despite the fact this page takes you to another front page, as opposed to the data itself - the ONS analysis of employment and unemployment around the country is unparalleled. The figures include a breakdown of benefit claimants by parliamentary constituency and local authority - plus complete analysis of useful things like unemployment by nationality and the breakdown of the UK's inactive population - find out how many people are 'discouraged' here.

7) Provisional Monthly Patient Reported Outcome Measures (PROMs) in England

Want to find out how successful your hospital is at routine operations? This is the place to get it. It includes pre and post-operative state of health data for common procedures such as: hip and knee replacements, varicose vein and groin hernia. The figures come out monthly.

8) NHS England - Connecting for Health Organisation Data Service - Data Files of NHS Organisations

One of's strengths is how it has opened up the government's complex and confusing geographical meta-data. Every government department, it seems, has its own regional breakdown of the UK. There's one set for the ONS, another for transport. One of the most useful, however, is this set for health. This gives you the location of every hospital, primary care trust and health body in England, an invaluable resource for working with health statistics or simply mapping your closest health resources.

9)'s meta-data

These last two are more about than the issues of the day. Find out exactly what each dataset contains - updated monthly. It's a complete guide to Just have to figure out what to do with it now.

10) Directgov article ratings

Want to know the most useful pages on Every page has a ranking system, and this dataset gives you a full monthly list of every one of the times the ranking tool has been used. So, if you download the March file, you get some 17,000 rankings, the tops ones of which are the myriad of directgov pages on motoring, how to tax your vehicle and one showing the new minimum wage. Now that is useful data

Simon Rogers edits the Guardian's Datablog and Datastore ( You can follow him on Twitter @datastore

1 comment

  1. Comment by Anonymous posted on

    This site is really resourceful. I re-comment it to my fellow learner.


Leave a comment

We only ask for your email address so we know you're a real person