On the registers team, we are building a product that provides teams with data that is good enough to build services on. Registers are defined by certain characteristics such as being live data and being the only authoritative list for a thing. We thought it would be worthwhile exploring how these characteristics are actually implemented.
Technology and process
The operation of registers is underpinned by both technology and process. Not all of the characteristics of a register are linked to technical implementation. Some are purely process-based. For example, characteristic #1 is “Registers are canonical and have a clear reason for their existence”. There is nothing stopping someone from building a system which has all the technical features of a register but which doesn’t have any reason to exist or is not canonical. However, such a system would not satisfy the criteria for the register.gov.uk domain. Ensuring that registers have a purpose is upheld by the process for creating new registers.
On the other hand, characteristic #5 is “Registers are able to prove integrity of record”. This characteristic can not be fulfilled through policy or process means alone: the underlying technology of registers must provide features to generate cryptographic proofs.
In this post, we will distinguish between register characteristics which are supported by business processes, and those which are supported by the features of the technical implementation.
Registers are lists of data
Since registers are lists of data, and most data technologies support storing lists of data, it seems very easy to imagine a register as a table in a relational database, or as a simple key/value store, or as a simple triplestore. Your favourite data storage technology could easily store the contents of a register.
Furthermore, each register has a constrained set of fields that can be used. For example, the country register has fields such as `country`, `name`, and `official-name`.
Process: Registers are reliable
Our introductory guidance on registers says that each register is the most reliable list of its kind. Why might data not be reliable? What causes problems with reliability?
One problem is that data might not be authoritative - it might not be published by the entity responsible for maintaining that data. For example, a Land Registry title might refer to a limited company by name, but the body officially responsible for incorporating (and dissolving) limited companies is Companies House. That makes Companies House the authoritative source of this data.
Non-authoritative data is, by definition, duplicated. Whenever data is duplicated, it introduces possibilities for error: typos, transcription errors, and so on. Furthermore, because the non-authoritative publisher is not directly responsible for this data, it is unlikely to notice or fix these errors. Non-authoritative data tends to become unreliable over time.
To fix this, custodians should publish only data that they are directly responsible for: they should publish minimal datasets. Where there is a desire to publish data where another body has authority, this should be published as a link to the authoritative source rather than as a non-authoritative copy. For example, rather than publishing a name of a limited company, you could publish a link to the Companies House URL for the particular company. Or you could publish the company number and the user can use this unique identifier to look up the company records at Companies House.
Therefore, in order to fulfil the policy that registers are reliable, registers must support linking to data held by other organisations.
Feature: Registers support linking between organisations
Registers must support linking data together. Links are a way of removing duplication. For example, in the relational database world, duplication can be removed from a table by a process of normalisation: breaking the large table up into multiple small tables, linked to each other using foreign key references.
Registers must support linking to data held by another organisation. For example, a register of approved Digital Marketplace suppliers (maintained by the Crown Commercial Service) might link to the register of limited companies (maintained by Companies House). The register of Digital Marketplace suppliers is a list showing which companies may trade on Digital Marketplace, but it defers to the authority of the register of companies to show the official names and officers of a company.
Feature: Registers provide guarantees of integrity
Registers have a cryptographic proof of integrity. If you have a record from a register, you should have some way of understanding that this data comes from a register, and demonstrating that nobody has tampered with the record through malice or otherwise.
We have previously written about our exploration of integrity guarantees.
Feature: Registers are append-only
An item in a register is never modified; instead, a new entry is added to the register marking an update to an existing record.
This also means that a register has all historical data in it. For example, at the time of this blogpost, the Gambia has had three different versions in the country register. The country register provides pages for the current version of the record as well as a list of all versions of the record.
Not a feature: Registers do not provide ad hoc querying
One feature that registers do *not* provide is ad hoc querying. Some database technologies provide rich query languages that allow making domain-specific data queries.
Making a general-purpose query API open to the public internet is fraught with danger. A sufficiently advanced API will allow users, by accident or malice, to make queries that consume a large amount of resources. This can result in denial of service, or very slow response times, impacting the experience of other users. Registers allow simple lookups, but for more complex domain-specific queries, users can download the dataset and put it into their own index.
There are many policies and many technical features which are required to create a register. In this blog post, we’ve described some of the necessary technical features.
However, when we talk about the technical details, we often get asked whether a register is like a particular data technology such as a relational database, a graph database or a distributed ledger. In the next post, we will compare registers with some of these other technologies.