We're often asked what technology is behind data.gov.uk since the excitement of open data has spread far round the world. The short answer is that we use two open source products: CKAN for the data catalogue and Drupal for the blogs, forums, comments etc.
Drupal is a well-known Content Management System that has lots of support for normal website structures. So setting up a blog is quick and simple. And there are plenty of things it provides simply, like the infrastructure for registering users accounts, giving out priviledges to administrators, dealing with spam etc.
We could have done a simple data catalogue in Drupal as well, having a page for each dataset, a web-form for administrators to edit it and provide users with a search box. But beyond those basics it starts to get complicated. You add in pages for publishers, start assigning edit rights for users attached to those publishers, make the search faceted, add an API for the catalogue, provide daily dumps of the catalogue, add in previews for the data and so on, and you start to realise this is far beyond what Drupal gives you out of the box, and you are creating a whole new web product.
So instead of start from scratch, data.gov.uk harnesses CKAN for its data catalogue. It is developed by the Open Knowledge Foundation with support from data.gov.uk and it is flexible enough to support the needs of data.gov.uk and dozens of other open data sites around the world.
Integrating CKAN and Drupal
The question remains of how we use these two systems to form one integrated website. We originally used Drupal as the front-end and CKAN was used for its dataset edit forms, API of metadata, whilst the Drupal front-end accessed the metadata via the API and presented it in the Drupal front-end. This suffered from having a lot of integration points that were difficult to develop and maintain.
So it was decided to get CKAN to deal with all aspects of the data catalogue - i.e. be the front-end as well as the back-end for it. We are now much more comfortable in this way. So when you click the 'Data' tab of the site, Apache sends the request to CKAN, and the other tabs are served by Drupal. There is no difficult-to-manage database synchronisation between CKAN and Drupal that easily gets out of step, there are no chunks of HTML being passed between the systems with JS & CSS controlled by only one system, and there are much less internal requests being made, which cause confusion when there is an error. But we share the template designs & CSS code to make them look the same.
There are now only a couple of integration points left to make the user experience seamless:
- User log-on and credentials are handled by Drupal, which provides an internal API for CKAN to confirm the details.
- The home page of the site is produced by Drupal, but displays content from CKAN - the latest datasets. CKAN provides this via an API and it is cached in Drupal for a short period in case it goes wrong. It's worth going to the trouble of maintaining this, since it is extremely well used, being on the front page.
- Comments are handled by Drupal, to make it easier to manage from the back end in one place
For more details about CKAN integration, please consult this case study document produced by the Open Knowledge Foundation: