Skip to main content

https://data.blog.gov.uk/2012/11/16/rating-the-datasets-how-many-stars/

Rating the datasets - how many stars?

Posted by: , Posted on: - Categories: Open Data

data.gov.uk continues to open up large quantities of data that was previously closed. Government data has become 'open by default' and the site has grown from the initial 1000 datasets to an amazing 8000 - one of the most successful open data movements anywhere.

But occasionally we get a message from a member of the public who is surprised to find the data they expected is not at the end of the 'Download' link. Maybe the data has moved, or it is not in the format advertised or not even available any more. Many will be suprised to hear that a small number of datasets require registration before access, or can only be posted to you on CD-ROM, or even require fees.

With 8000 datasets being listed on data.gov.uk by over a thousand different individuals across 700 bodies over the last three years, its not surprising that there are some issues and variance in quality.

Tim Berners-Lee set the tone for data releases: it started with "Raw Data Now!" (just get it out there, whatever the current quality) but he also set out what it should aspire to - the Five Stars of Openness:

Data gets three stars if it is currently available (not a broken link), openly licensed (no particular legal restrictions on reuse), structured and in an open format. Plenty of datasets on data.gov.uk are like this, such as spreadsheet tables stored in CSV files, or geographical boundaries in KML.

To get all the way to five stars it needs to be linked to other datasets on the internet. To do this its data points are made all available at separate addresses on the Internet, the data properties are expressed in common standards, and the links to other datasets are added. For more about Linked Data, see: What is Linked Data.

When data.gov.uk was relaunched in June, every dataset was given a star rating. We've been working to improve the rating algorithm and have this week added the ratings to the search page, so you can see what the distribution of stars is across all the datasets, and compare the quality of data from different departments.

We firmly believe that the way to improve the quality is to make these scores public. It may well influence a small council to not put out a PDF of spend data in favour of a spreadsheet, or maybe a spreadsheet of poverty indicators or international aid donations could be upgraded to a format that allows comparisons internationally.

With high quality standards expected, and a rating algorithm that will evolve in sophistication as we iterate further, many of the ratings might appear harsh:

  • If a dataset just has a link to another web page that requires you to hunt around for the actual data, we award it 0 stars.
  • If data is offered not under the Open Government Licence but instead have terms and conditions we award it 0 stars. (You might not easily know if you can even print off that dataset)
  • Some PDF files are produced pretty well, containing embedded spreadsheets - but that makes it difficult for a user's automatic tool, so we score that the same as a bad PDF scan - 1 star.

Read more about it here: 5 Stars Rating Algorithm

So when you look today and see that half of the datasets get 0 stars, be proud that this government sets the bar of quality high, and is brave enough to be open about not only what is good, but what is not good enough.

Sharing and comments

Share this page

3 comments

  1. Comment by DanLear posted on

    The "openness" rating is a great idea and one I haven't considered before, however my first thought when reading the title of this blog post was that it would be discussing rating the "quality" of the data.  This is always highly contentious and subjective, so how could it be achieved? The original data provider can give an assessment, however this must be as transparent as possible, and must also be comparable between datasets and providers, therefore some sort of framework is required.  An indication could be derived from mandatory elements within the accompanying metadata, with an option for additional validation by crowd-sourced ratings. The age, breadth, collection techniques and spatial scale will hav

  2. Comment by Deborah Sacks posted on

    The web page says that this data is not available for free, but it doesn' say how I could apply (or pay) to obtain it.

    It also doesn#t give any idea of what is contained in the dataset, should I wish to pursue this.

     

    Could you let me know how I can obtain this dataa please?

  3. Comment by Beyond Waste posted on

    I wish to endorse the request for clarification in the comment above.

    This data was previously freely available I believe on the Environment Agency website.

    Alan Potter

    Beyond Waste