In February this year, the Cabinet Office Transparency Team started to tackle the problem of broken links on data.gov.uk. These are links to data files listed on data.gov.uk that give an error when you click to download them. The public body that hosts the data may have made an error when giving the data's location in the first place, or may have changed its location or it could be one of the dozens of bodies moving their whole sites onto gov.uk. The organisation may also have been dissolved or changed, as happened when the structure of the NHS was overhauled last year. In February, the number of broken links across the site sat at 9,348, representing around 14% of all data files on data.gov.uk.
The data.gov.uk developers began to look at batch fixes, updating links where they had been moved or making use of the National Archives central government website archive. In addition, tools to automatically check for the broken links have been developed. With the help of the Cabinet Office Transparency Team, organizations are working methodically through the lists of broken links and have so far made hundreds of corrections. This work has led to a significant reduction in broken links with around 4,175 or around 6% of files on data.gov.uk remaining broken. However, this figure is obviously still far too high.
The 'broken link checker' is software that automatically tests data links in data.gov.uk. When a link is added to data.gov.uk, the link is tested then and thereafter on a weekly basis. The results are shown on a report that is assembled nightly and is publicly viewable here: http://data.gov.uk/data/report/broken-links. We also show the problems on the dataset itself - broken links are marked with a red exclamation mark. The broken link reports have been running for a year and have been refined over time. We have gradually exposed them to public bodies and now openly to everyone. We hope this will help get the broken links fixed, and set user expectations more realistically, before they click on the bad links.
The broken link checker and reports are developed in open source as extensions to CKAN. The majority of governments round the world are using CKAN for their open data catalogues, collaborating and benefiting from open source development. There has already been plenty of interest in these particular tools from outside of the UK Government. The code is here: https://github.com/datagovuk/ckanext-archiver https://github.com/datagovuk/ckanext-report.