Skip to main content

Health datasets on the NII

Posted by: , Posted on: - Categories: Health

Data from health and social care has been identified as a case study by ODUG to illustrate how the refreshed NII might well look. It has also been selected as one of three exemplar areas to prototype the new framework.

Currently, data on health and social care is made available by the Health and Social Care Information Centre (HSCIC), NHS Choices (actually part of HSCIC), NHS England, Care Quality Commission and Public Health England.  There are currently over 1,600 datasets with a primary theme of health, clearly these should not all be included within the NII.

Using the high-level categorisation suggested in the ODUG’s NII: Why, What and How paper, we set out to identify some broad inclusion criteria:

  • Core Reference Data should be included – these are datasets which describe actual things, such as NHS Trusts or dentists, in matters of fact. Typically this means their name, address and other attributes.
  • Subject Data should also be included – these are datasets which provide additional information about the core reference data.  Examples include opening times, and details of services offered.

Additionally, core reference data includes key glossaries, dictionaries and terminologies, which provide the means to describe the attributes of key health and care data, such as codes used to classify health conditions.

Of equal importance are the criteria for excluding datasets:

Datasets covering any statistics, activity, transactions or performance are excluded. We publish a vast array of this kind of data, and it becomes impossible to differentiate between those which are “strategically important”, and those which are not.  However, all of these datasets should be linkable to NII datasets.

As a starting point, this has generated an initial cut of 58 datasets, although we fully expect this to evolve in time. Have we got the right datasets?  What else from health and social care should be included within the NII?

The area which provided the greatest challenge was around the seeming duplication of data.  There are, for example, two datasets which appear to provide a complete list of GP practices. Practices are also included in the CQC directory. A quick scan of these datasets also reveals that there are differences: it is clear that the number of records differs.

To explain this, it is first helpful to understand a little about where these data are drawn from, and importantly the purposes for collection and publication.

Datasets provided and maintained by the HSCIC’s Organisation Data Service (ODS) are primarily done so for use within the health and social care sector; they are used as part of the prescription licensing process, used for payment purposes, financial management and performance monitoring, and are used to aid secure electronic communications between NHS organisations.  More information about the ODS and its role, is available on the HSCIC website.

Datasets provided and maintained by NHS Choices are primarily done so in order to populate the NHS Choices website, and as such are intended for use by patients and the public.  Much of the data is originally sourced from the codes and details maintained by ODS, however in order to ensure that this information is as up to date as possible, and is meaningful to the public, there is some additional validation and cleansing that takes place. It is sometimes matched with data from other sources, in order to obtain any details not held by the ODS. And health and care organisations themselves are also able to edit their NHS Choices entries, to ensure that they accurately reflect the services they provide.

As a result, there are naturally some differences between data files which at face value would appear to be identical, for example:

  • Names of organisations may differ between datasets – an organisation’s official registered name may be different to how it is known locally.  In these cases, NHS Choices is more likely to include the local name.
  • Organisations which have closed – although an organisation may have closed to patients and the public, it may still be required to be left “open” on IT and finance systems for a short while after closing.  Data from ODS is more likely to include these organisations, whereas NHS Choices is more likely to remove them.
  • Statistical and performance data within the health and care sector is usually published at organisation level – which can differ from places that provide treatment.  For example, data on waiting times is reported for each NHS Trust; however a trust can be made up of several acute hospitals.  Here, ODS will be the best source to establish the organisations being reported on.

It is clear that when using health organisation reference data, care should be taken to select the correct source for the purposes. More information is provided alongside each dataset, describing how the data for each is maintained.

Sharing and comments

Share this page