Skip to main content

https://data.blog.gov.uk/2010/06/25/new-public-sector-transparency-board-and-public-data-transparency-principles/

New Public Sector Transparency Board and Public Data Transparency Principles

Posted by: , Posted on: - Categories: Uncategorized

The Public Sector Transparency Board, which was established by the Prime Minister, met yesterday for the first time.

The Board will drive forward the Government’s transparency agenda, making it a core part of all government business and ensuring that all Whitehall departments meet the new tight deadlines set for releasing key public datasets. In addition, it is responsible for setting open data standards across the whole public sector, listening to what the public wants and then driving through the opening up of the most needed data sets.

Chaired by Francis Maude, the Minister for the Cabinet Office, the other members of the Transparency Board are Sir Tim Berners-Lee, inventor of the World Wide Web, Professor Nigel Shadbolt from Southampton University, an expert on open data, Tom Steinberg, founder of mySociety, and Dr Rufus Pollock from Cambridge University, an economist who helped found the Open Knowledge Foundation.

In the words of Francis Maude:

“In just a few weeks this Government has published a whole range of data sets that have never been available to the public before. But we don’t want this to be about a few releases, we want transparency to become an absolutely core part of every bit of government business. That is why we have asked some of the country’s and the world’s greatest experts in this field to help us take this work forward quickly here in central government and across the whole of the public sector.”

At their first meeting yesterday they discussed some new Public Data Transparency Principles.

Working definition of “Public Data”

"Public Data" is the objective, factual, non-personal data on which public services run and are assessed, and on which policy decisions are based, or which is collected or generated in the course of public service delivery.

Draft Public Data Principles

  • Public data policy and practice will be clearly driven by the public and businesses who want and use the data, including what data is released when and in what form – and in addition to the legal Right To Data itself this overriding principle should apply to the implementation of all the other principles.
  • Public data will be published in reusable, machine-readable form – publication alone is only part of transparency – the data needs to be reusable, and to make it reusable it needs to be machine-readable. At the moment a lot of Government information is locked into PDFs or other unprocessable formats.
  • Public data will be released under the same open licence which enables free reuse, including commercial reuse – all data should be under the same easy to understand licence. Data released under the Freedom of Information Act or the new Right to Data should be automatically released under that licence.
  • Public data will be available and easy to find through a single easy to use online access point (data.gov.uk) – the public sector has a myriad of different websites, and search does not work well across them. It’s important to have a well-known single point where people can find the data.
  • Public data will be published using open standards, and following relevant recommendations of the World Wide Web Consortium. Open, standardised formats are essential. However to increase reusability and the ability to compare data it also means openness and standardisation of the content as well as the format.
  • Public data underlying the Government’s own websites will be published in reusable form for others to use – anything published on Government websites should be available as data for others to reuse. Public bodies should not require people to come to their websites to obtain information.
  • Public data will be timely and fine grained – Data will be released as quickly as possible after its collection and in as fine a detail as is possible. Speed may mean that the first release may have inaccuracies; more accurate versions will be released when available.
  • Release data quickly, and then re-publish it in linked data form – Linked data standards allow the most powerful and easiest re-use of data. However most existing internal public sector data is not in linked data form. Rather than delay any release of the data, our recommendation is to release it ‘as is’ as soon as possible, and then work to convert it to a better format.
  • Public data will be freely available to use in any lawful way – raw public data should be available without registration, although for API-based services a developer key may be needed. Applications should be able to use the data in any lawful way without having to inform or obtain the permission of the public body concerned.
  • Public bodies should actively encourage the re-use of their public data – in addition to publishing the data itself, public bodies should provide information and support to enable it to be reused easily and effectively. The Government should also encourage and assist those using public data to share knowledge and applications, and should work with business to help grow new, innovative uses of data and to generate economic benefit.
  • Public bodies should maintain and publish inventories of their data holdings – accurate and up-to-date records of data collected and held, including their format, accuracy and availability.

They are asking everyone to help shape and define these important principles and have set up a commentable version on our wiki. Please use the talk page to discuss the principles and the wiki page to make any changes needed.

Sharing and comments

Share this page

31 comments

  1. Comment by Anonymous posted on

    While this is clearly aimed at central government, it'd be good to see a similar or identical document adopted and promoted by CLG for use in local government. For example, most of the council spending data that has been released lacks an explicit open licence, some is aggregated rather than fine-grained and some is only available in PDF format or through a proprietary platform.

    • Replies to Anonymous>

      Comment by Anonymous posted on

      Please can I direct you to http://linked4.org/lsd/explore.html a recently launched site for local government spend analysis in linked data format. Currently visualises Royal Borough of Windsor & Maidenhead linked data but more Local Government linked data sets will be loaded in the near future.

      The data was easily produced using our Linked4 utility which converts data extracted in xls or csv format to a linked (rdf and xml) format. The utility works against any data source and can assist with subjective/objective data analysis and indeed data redaction if required.

      Even though the Linked4 site is focused on Local Government spend analysis, the capability is available for any sector and to any organisation. However, we believe that appropriate 'trusted' organisations need to offer sector data aggregation points for their sector data - http://linked4.org/lsd/explore.html has been created to show that this can be done easily and very cost effectively.

      So, can we help? http://www.unit4software.co.uk/markets/public_sector/linked-open-data 

      Anwen Robinson (MD UNIT4 Business Software Ltd)

      • Replies to Anonymous>

        Comment by Stephan posted on

        The scope of this could be clearer. Is it intended to cover all publicly funded bodies, including those in the Higher Education sector for example? What about research data, especially pre-publication? And will there be FOI-like exemptions for data that is commercially sensitive?

        Shelly Hughes

  2. Comment by Anonymous posted on

    At a high level this is a good start and sets the overall tone and direction.

    However government bodies and departments will need more detailed guidance on what to make available (eg budget data) and will need considerable technical help to convert and publish data in machine readable format.

    Paul Cook
    Director of Finance & ICT
    Surrey and Sussex Probation Trust

  3. Comment by Anonymous posted on

    Thanks for providing this comment box which one can use in privacy without having to remember anything or get involved with any strange rituals eg " logging in/ registering".
    Long may comment boxes like this one serve parts of the public which otherwise might not be reached at all.

    But with reference to;

    "Public Data" is the objective, factual, non-personal data on which public services run and are assessed, and on which policy decisions are based, or which is collected or generated in the course of public service delivery."

    One wonders if all our notes and comments, submitted to public and government services in the spirit of voluntarism albeit from the privacy of wherever we happen to be may also be treated as "public data". I hope so.

    • Replies to Anonymous>

      Comment by krisdev posted on

      In my opinion, all Data available with any Public Authority must be deemed as Public Data. Only Data classified and declared as Secret affecting National Interest should be excluded from the purview of Public Data. It is best to put all G2G, G2C, G2B communications on the web on real time basis by introducing a single e-Governance Tool where the entire vertical and horizontal hierarchy of Governance from the national to the last local level is integrated and all citizens / business are also integrated. Biometric Zip Drives with sufficient security built around the inputting, transmission, storage and retrieval of data is important. Any communication not included under the Online Public Data System should not be recognized as official communication.

      Kris Dev
      ICT & e-Gov Consultant
      LIfe Line to Business
      http://ll2b.blogspot.com.

  4. Comment by Anonymous posted on

    Many thanks for sharing the principles at this stage.

    One small technical issue. I'm logged in and able to edit other Wiki pages - but don't seem to be able to edit or add to the talk page of the Commentable principles. Is edit access restricted? Or is this a glitch with individual user accounts?

    One small process issue. It would be great to know how you are planning to use comments people make, and how and when you will be taking them into account.

    A few substantive issues

    1) Relationship to other principles - many of these seem to map on the '8 principles of government transparency' derived a number of years back. Would it be better to adopt those principles with clarifications / additions for the data.gov.uk context, rather than producing a completely new list of principles?

    2) Meta-data - It would be good to give more explicit encouragement to the provision of good meta-data, and the progressive improvement of information around the data in the catalogue to help people make sense of datasets.

    For .e.g. there is now a lot of information scattered across the web in blog posts; note pads; PDFs etc. on how to make sense of COINS. An encouragement that would lead to the Treasury, for example, adding links to this information to the COINS record in data.gov.uk would be a good thing.

    Some explicit (lightweight) base-line standards for meta data quality would be useful I think...

    3) Linked Data. Linked data is not always the most end-user friendly format: and focussing on linked data prioritises the role of developer intermediaries over citizens who just want access to specific elements of data.

    I would encourage a broader principle here, that doesn't undermine the value of linked data, but recognises the broader range of format needs.

    Something along the lines of Release data quickly, and then work to make sure it is available in open standard formats, including Linked Data formats.

    This would also capture the idea that it does not have to be government that does the format conversation. If, as in the COINS case again, bodies such as WDDMG and The Guardian have created interfaces onto the data that output CSV / JSON, then Govt should be encouraged to link to those, rather than try and create it's own rendering of those formats; with the govt responsibility being to monitor the continued availability and accuracy of those data sources - not to replicate them.

    If focussing on Linked Data only then there is a strong obligation on the data.gov.uk project to develop and user-test tools that mean individuals with absolute minimal technical knowledge can get hold of data in formats they can use with simple consumer tools (Excel; Google Maps etc.)

    4) Actively encouraging re-use I would encourage more reflection on whether 'The Government should work with business to help grow new, innovative uses of data and to generate economic benefit' sits best as a principle.

    This seems to be to be a possible 'programme' or 'policy' for government to carry out; but to accord it status as a principle seems to be committing government to a substantive use of resources to subsidise a particular sector of private enterprise...

    5) A minor point - the last principle would probably sit better further up in the list...

  5. Comment by Anonymous posted on

    What is the relationship between this new site, and the existing Information Asset Register at
    http://www.opsi.gov.uk/iar/index

    Will the IAR initiative be shut down under the recent Cabinet Office initiative on web site costs?
    (http://www.cabinetoffice.gov.uk/newsroom/news_releases/2010/100624-websites.aspx)

    What resources will Central Govenment make available to its departments, agencies and non-departmental public bodies to create and maintain the inventories of data holdings (metadata, last data principle). Many of the IAR records look pretty unloved and uncared for i.e. out of date. Creating and maintaining metadata is itself a time-consuming task if its results are to be of any use.

  6. Comment by Anonymous posted on

    Just a minor point, but if you had put named HTML anchors alongside each principle, it would have been easier for commentators on third party sites to post links that point directly back to specific principles allowing you to more easily capture comments appearing on remote sites, and enabling the automated archiving of pointwise micro-discussion on things like Twitter?

    • Replies to Anonymous>

      Comment by TimDavies posted on

      The Write to Reply version just put up has anchors for each point:

      http://writetoreply.org/doodlings/draft-public-data-principles/

      Perhaps the Public Sector Transparency Board could consider taking into account comments left over on Write to Reply - as the Wiki page still seems not to be editable (at least I can't edit it from this log-in)...

  7. Comment by Anonymous posted on

    The scope of this could be clearer. Is it intended to cover all publicly funded bodies, including those in the Higher Education sector for example? What about research data, especially pre-publication? And will there be FOI-like exemptions for data that is commercially sensitive?

  8. Comment by Anonymous posted on

    As a UK tax payer I fully accept that data collected at public expense should be made available for re-use, and free of extraneous licence conditions or fees. However as a UK PSI employee I do not see why UK assets should be free to use by non-UK based organisations in business which do not benefit UK citizens.

    My question to the Board is this: In light of the UKs fiscal situation, why aren't we utilising these incredibly valuable assets to maximise UK licensing revenues based on intended use. Intended use UK = FOC / Intended use anywhere else = Licence fee?

  9. Comment by Anonymous posted on

    One more principle from my side :-). All countries should compulsorily take part in opening their Govt data to the public and this initiative should not restrict itself to U.K., U.S. and Australia.

    • Replies to Anonymous>

      Comment by Anonymous posted on

      Chile in on the same track too.

  10. Comment by Anonymous posted on

    This site aims at data reuse by technical people but the public ask for data in a readable form (in Local Government). Some may be able to understand how to use Excel (csv) but other will need in PDF format. The standard should be to be able to convert into many forms and not just dictate.
    Secondly much of our data is locked away in older systems that do not allow extracts without additional development costs or a resource to do this manually - something we do not have as budgets are cut. Something to bear in mind please!

  11. Comment by Anonymous posted on

    Just wanted to make sure there's something to make sure that the principle of non-discrimination is embedded in the principles

  12. Comment by Anonymous posted on

    I assume then the government owned OS mastermap will be one of the data sets freely made available for reuse very soon.

  13. Comment by TimDavies posted on

    I've written a few quick notes here about the importance of not ignoring the non-machine readable steps on the way of linked and open data - as for many use cases simply having access to 'facts' from government data will be valuable.

    I've also in that post suggested it might be useful to encourage authorities to publish a list of datasets they have, but don't yet have open, in order to enable users to better drive the prioritisation of data release and reformatting.

  14. Comment by Anonymous posted on

    The fourth principle could be misinterpreted to imply one and only one access point. Surely not what is intended? There must be room for more specialised places that have more detailed and more specialised metadata related to particular domains (e.g. health, statistics, etc.). Perhaps the word "single" is not needed - we do need a place where everything can be found, but is surely should not be the only place.

  15. Comment by Anonymous posted on

    Hi,

    more specific advice please. Going to publish statistical data in a set of tables in .csv format.

    Issue is there is some commentary and notes included in the tables - what is the best way to tackle this mix of numbers and words to make it easily re-usable

  16. Comment by Anonymous posted on

    The principles do not address the fundamental issues of what the public sector is...

    + Does it includes statutary advisors (EG NDPBs)?
    + What about NGOs delivering government business in the big society?
    + Parish Councils?

    Should we use an exisiting definition such as that in the EIRs?

    Finally not all public sector organisations are Crown organisations so the Data.gov.uk licence can't be used by them as it is a crown licence.

  17. Comment by Anonymous posted on

    The following response is on behalf of the members of the Demographics User Group (see http://www.demographicsusergroup.co.uk )

    DUG has been an active supporter of better access to public sector information for more than a decade, and is delighted with the progress being made with http://www.data.gov.uk

    We welcome the 10 draft Public Data Principles, and are especially pleased to see that the first is: “Public data policy and practice will be clearly driven by the public and businesses who want and use the data, including what data is released when and in what form – and in addition to the legal Right To Data itself this overriding principle should apply to the implementation of all the other principles”.

    In order to achieve this it will be necessary for the Transparency Board to put significant effort into encouraging dialogue with a wide range of existing and potential user communities (the public, business, local government, etc.) and expertise (specialists, mainstream analysts, and occasional / new users) to establish priorities.

    We also recognise that at this stage, the emphasis is on making use of data that are a by-product of public service delivery (rather than seeking a fresh view of information that is needed for the public good). In this context, we urge the Board to be creative in using existing data to create new information. For example, HMRC’s files of individual records should be evaluated to establish whether they can be used to create aggregate statistics on Incomes for small areas. There are many other similar examples of the potential to use personal administrative records to create anonymous statistics of great value to decision-makers and the public.

    Keith Dugmore

    • Replies to Anonymous>

      Comment by Anonymous posted on

      It would be nice to think that in light of the drive to data openess that the likes of John Lewis and the Co-op (along with other members of DUG) would reciprocate in opening up the dat athat they hold.

  18. Comment by Anonymous posted on

    it is good to set out principles, as others have commented there will be a need for detailed guidance.

    as a starting point, in the spirit of access to data, please can we be told the salaries of those appointed to the Transparency Board, how many days per week they will be spending on this, whether they are now civil servants and if not where they are counted in official figures, as well as details of the fair and open competitive advertising which took place to appoint these people.

    Please can we have copies of all those offering tender bids or applying for the posts, and details of the process used to decide whom to appoint.

    Please can we have greater clarity on what will be cut in order to provide the additional access proposed - nothing is free, certainly not data provision nor government websites!

  19. Comment by Anonymous posted on

    It is surprising that a new government agency established to set standards for public data fails to recognise that "data" is a plural.

  20. Comment by Anonymous posted on

    At the meeting of APPSI on 22 July 2010 members heard a presentation by The National Archives staff on the Transparency Agenda. It was subsequently agreed that APPSI should express some views to the consultation now underway on the Public Data Transparency Principles and work programme. This note provides those views:

    • APPSI has long argued that the government requires a strategy to prioritise information garnering rather than relying entirely on serendipitous data harvesting of what is readily available. We understand that there is no strategy in place to prioritise datasets for incorporation in data.gov.uk. We regard this as wasteful and unlikely to deliver the maximum benefit in the short or medium term.

    • We welcome the Public Data Transparency Principles. But government’s working definition of ‘public data’ contradicts the ethos of the Principles in that it does not address the issue of public good. The existing definition is almost entirely predicated upon the management and policy needs of government. It also makes clear that the data are those created as a by-product of public service delivery. Taken at face value, all this is a reversion to the Rayner Review of the 1980s. Given the Public Data Principles, the Prime Minister’s letter to departments of 31 May 2010 (see: http://www.number10.gov.uk/news/statements-and-articles/2010/05/letter-to-government-departments-on-opening-up-data-51204) and existing and putative legislation, we suspect this phrasing is an oversight and urge that government should reconsider this definition. A version more in tune with the Principles would be: ‘Public data’ are the objective, factual, non-personal data collected by government at all levels to meet policy, service delivery and public accountability purposes, to enhance the capacity of individuals to be active citizens and to facilitate innovation.

    • The first Public Data Principle: Public data policy and practice will be clearly driven by the public and businesses who want and use the data, including what data is released when and in what formats can not be met without effective consultation with users – current and latent. Such consultation is difficult – as the long experience in the official statistics world makes clear. Without it however success will only be by luck. We understand that the Transparency Board will consider user representation. We urge a more purposeful and planned engagement with the user community rather than simply providing data in the hope that this will meet needs.

    • In order for government to make data freely available it is important that the public task, which generates the information, is clearly defined. We are pleased to hear that this matter is under active discussion and look forward to seeing the results.

    • APPSI’s members from the devolved administrations pointed out that the Transparency Agenda is very Whitehall-centric and more needs to be done to establish a relationship with those administrations.

    • One member commented that, based on his experience, data.gov.uk is very confusing as the data is available in formats that can’t easily be re-used and metadata is very limited in explaining the characteristics (hence reliability) of the data. He recognised that this might be transitory given the early stage of development of the web site. Has there been any investigation of the usability of the web site and the active use of the data therein?

    • It was agreed amongst APPSI members that measuring the economic and social value of data.gov.uk would be difficult, not least because of the shift of policy outcomes emphasis between administrations. Given the significance of the whole workstream, the expenditure of public funds and the strong political support, APPSI members nevertheless believe it would be responsible for a benchmark to be established now so that changes wrought by data.gov.uk could be assessed effectively at some stage (e.g. in three year’s time).

    • In addition, APPSI members debated the trade-offs between continuing to publish data in existing, internationally-defined standards specific to a discipline and re-engineering them into the more universal form underpinning data.gov.uk. We concluded that the relative merits of these might be case-specific, that the resources required for any re-engineering were not clear to us and that indeed both approaches might end up running in parallel.

    Advisory Panel on Public Sector Information
    24 August 2010

  21. Comment by Anonymous posted on

    Please be aware that UNIT4 (authors/solution providerr of Agresso finance system) are developing a free of charge utility that will enable our customers (including our 90+ local government customers) to produce data from Agresso in a linked format.

    All of our customers can already produce/publish open data from Agresso themselves at no additional cost.

    We are currently working directly with Royal Borough of Windsor & Maidenhead and will be producing further information in the near future on these developments.

    Note this will be for all customers irrespective of sector and will also cover the provision of non finance data.

    If anyone is interested in this work then please contact me for further information

    Anwen Robinson
    MD
    UNIT4 Business Software Ltd
    anwen.robinson@unit4.com

  22. Comment by Anonymous posted on

    The Principles should pay more attention to usgae as well as generation:

    TRANSPARENCY AND OPEN DATA: ADDRESSING VALUE ISSUES 

    1. The prospect of increased public data transparency offers rich returns for democracy, accountability and the efficient and effective use of public resources. The transparency initiative is to be welcomed and the transparency board encouraged as to the economic and social value of its work.

    2. However the value and effectiveness of data transparency can only be derived from changes to decision making and behaviour resulting from access and use of the data. Data leads to decision making through their interpretation.

    3. The draft public data principles address the technical and policy issues around the delivery and accessibility of the public data. If value is to be delivered from data transparency then strategies for data release are only part of the picture. Facilitation of understanding, interpretation, decision making and action must be an integral part of the transparency process and the public data principles.

    4. To help the transparency board in its continuing deliberations I would like to offer some suggestions for supporting the process of interpretation by data users and hence enabling the conversion of public data availability into resource savings through improved decisions and actions.

    4. Data and Organisation

    (1) The meaning and value of data should be understood within its organisational context. Since data is generated for a purpose within an organisational context, the process and purpose of that data in the view of the organisation should also be transparent. Data should remain connected with owners who should be accountable for the accuracy and completeness of that data.

    (2) It should be possible to identify the level in the organisational hierarchy which connects with the data. Data which is disconnected from the people and the processes loses its value because it loses meaning. The result in that interpretation of the data is compromised.

    5. Meaning and Detail

    (1) Within an organisational hierarchy, the meaning associated with data changes depending on the level in the organisational hierarchy that the data is associated with. At a higher management level, detail is reduced and deeper meaning may be applied.  As information sources move between higher and lower level, organisational filters will be applied which give rise to different concepts.

    (2) Translation of concepts occurs as data moves up a hierarchy such that the concepts used and the interpretations applied can be very different at higher management levels than lower operational levels. If users are not aware of the conceptual filtering that has occurred in producing the data, they will not be able to draw reasonable conclusions from the data and apply it to decision making and action. Without understanding of filtering, translation and interpretation the data loses its value. 

    (3) For example, data concerning General Register Office performance is presented as monthly summaries of whether targets at met (portrayed as red or green indicators).Such data has been generated at a particular level in the hierarchy. Aggregation has occurred and detail has been replaced by an applied meaning.

    One target is: ‘To produce 90% of vital event certificates required within 10 days.’  There is no indication of how the filtering of this data has occurred. Why 90%? Why 10 days?  On what basis is the achievement of this target seen as good and sufficient? What has been lost in terms of regional and particular event detail which may be of much greater significance that the aggregated figure? On what basis was the detail data examined and success or failure in meeting the target determined? How does this help if I’m considering levels of staffing, regional difficulties, or indeed the connection of the administrative processes within the system with law issues, health issues etc? Will this data help me explain why my dad’s death certificate took six months to arrive?  

     (4) The concepts and policy understandings that lead to the data generation need to be explained at some level as part of data transparency so that the data itself can be critically reviewed and makes sense. An awareness needs to be developed of the points of view of the generators of data within public sector hierarchies and the potential mismatch with the worldview and context of data consumers.

    6. Chronology, Geography and Scalability

    (1) The time frame and rate of change of aggregated data generated at the top of a hierarchy will be different to that of the detail data generated lower down the hierarchy at an operational level. Changes will occur much more slowly higher up. Hence quite significant trends and phenomenon visible at the lower hierarchical levels may be hidden higher up. Lower level trends, of great importance in the detail may not scale up and may just disappear. In such a hierarchical system, a trend of concern to a data user low down in the hierarchy may be dismissed as not real or present at the higher level where agrgregated data might be driving decision making.  Additionally, the aggregation of data up the hierarchy may result in emergent phenomena and change which cannot be derived directly at the lower level.

    (2) Similarly in a higher level of the hierarchy, spatial coverage will be much greater, obscuring local effects and giving an impression of stability which may not ring true with data users whose concerns are much more local.

    (3) If awareness of chronology, geography and scalability concerns is not developed by both providers and consumers, then false comparisons will be made leading to decisions concerning phenomena which are only present at a particular level in the hierarchy.

    7. Aligning hierarchies

    (1)  There is an emphasis in data transparency on the development of apps that enable the comparison of datasets and the production of datasets which reveal new phenomena and connections. This has a high potential value in making connections between different datasets which reveal new trends. Such trends may enable different decisions to be made and lead to new action.  But without understanding the derivation of the data sets which are to be joined, the emerging results may be false.

    (2) An understanding of the derivation and the hierarchical position of data sets to be mashed will be critical to drawing valid conclusions. An attempt to mash as dataset from high up in a hierarchy with one from lower down would fail because the conceptual frameworks may be different and it may not be possible to translate between the two levels.

    (3) Hence it is essential that the derivation, context and framework of thinking behind each dataset is transparent.

    8. Identifying Capability Limits.

    (1) A key element in data transparency is that the public user has the knowledge and understanding to use the data to challenge decisions, and to action change through democratic processes. Data transparency offers much increased input and participation in public management. But this requires a set of capabilities in order to achieve effective outcomes. Availability of the datasets is only one part of the capability picture. Without access to the producers and the decision makers within departments, and the appropriate knowledge, environment and motivation, value will not be created from the data.

    (2) Consideration of the generation of value and change from public data must extend beyond the technical availability to the assessment of the capabilities needed for generating change from public sector data. Hence data release should be accompanied by the assessment of what additional capabilities are required for the effective use of that data in generating change.

    9. Educating the Public

    (1) Users of public data must have access to training, tools and conceptual understanding that will enable them to use the data effectively to achieve their goals. Guides and explanations of the data should neither patronise the user nor prejudice interpretation. Rather supporting materials should be designed to facilitate the effective derivation of conclusions about the data. The aim of delivering public data is to empower citizens and support the public involvement in improving society and improving the effectiveness of government.

    (2) Users of public data need to understand the context in which the data is generated and used. This context is complex, changing and organisationally embedded. Clear and easy guidance is critical to effective data use. Facilitation of technical skills in manipulating public data is of less importance than the development of conceptual skills in understanding the context of the data.

    10 Managing the Intermediaries

    (1) A majority of the general public will probably will not expend much effort looking at raw data because of a lack of motivation, time and skills. Most will rely on intermediaries to flag up important data and interpret it. These intermediaries will include the press and broadcast media, lobby groups, NGOs, political parties and so on. In an attempt to simplify, make a story or highlight particular agendas, their interpretations may be biased. Most data can be subject to several interpretations. Attempts to present interpretations as solid statistically based facts should be resisted. There is a significant risk of data transparency being rendered opaque by filters and translations applied by the media and other organisations.

    (2) Since much of the interaction with public data will be through the media and other organisation, attention must be given to protecting the public from false interpretations and over simplifications that produce prejudiced conclusions. Interpreters should be encouraged, if not required. (There may be a need for legislation here) to be explicit in identifying data sources and what transformations and translations the intermediary has used to reach their published conclusions. Press statements can be developed as a vehicle for guidance in data interpretation.

    (3) A computer-based expert system which draws conclusions or rules from data to support human decision making is not only explicit in what dataset it has used but will also provide a trace of the exact logic steps by which the conclusions have been reached. Similar reporting should be required of intermediaries.

    11. Conclusions

    Any effort to promote data transparency must ensure that organisational context is transparent as well. An awareness of the importance of meaning and context is essential. Departments must consider the processes, translation and filters that have been applied to derive datasets and also consider how their interpretations can be made public along with the data.

     ADDITIONAL DRAFT PUBLIC DATA PRINCIPLES FOR CONSIDERATION BY THE TRANSPARENCY BOARD 

    Public data will be accompanied by clear indication of their relationship with government departments. Data will indicate source, owner and contacts. It will be crystal clear for anyone to associate a data set with a particular group within in national and local government whose brief, hierarchical position and powers are explicitly described.

    Public data will be supported with descriptions of the processes by which they have been derived. Assumptions and definitions will be provided with data. Methods of aggregation, and interpretation will be described.

    Public data will be provided with a framework in which terms and concepts are given clear definitions. The semantics of data will vary according to department and source. The provision of definitions will enable users to discern whether quite different meaning is attached to data items from the meaning that the user might expect. In addition to what is the data and how was the data derived, answers to potential why questions should be provided. The form of a statement of context, process and semantics should be standardise and be then expected to accompany all datasets.

    The provision of public data will be accompanied by appropriate educational packages and guides to enable users to interpret the data to meet their needs.

    Issues concerning the timing of data, the geographical coverage and scalability problems will be explicitly identified when a dataset is published.

    Periodic datasets will be accompanied by an assessment of the capabilities needed for their translation, interpretation and use.

    Third party interpretation and dissemination should be accompanied by details of the original dataset and its accompanying guidelines amplified with explanations of any further transformations and the logic by which final conclusions are reached.

    Neil McBride BSc.,PhD, CertEd.

    Reader in Information Management

    Centre for Computing and Social Responsibility

    De Montfort University

     

    • Replies to Anonymous>

      Comment by Anonymous posted on

      The comments made here apply a fortiori when the 'data' is related to processes used to identify people as potential fraud using statistical inference and data mining as in a large number of current national fraud initiative exercises.  These actually work against 'democracy' as some of them involve suspecting people of not being entitled to register to vote at an address on the basis of falsely interpreted data originating in CT departments.  Somebody within government has for a long time been on a fool's errand to 'match up' all sorts of data on 'residence' including the full electoral register without proper consideration of the very different and legally significant definitions of 'address' and 'residence'.  As may predicted, a great deal of mess and injustice results, including, when the NFI gets in on the scene , hundreds and thousands of abortive fraud investigations based on statistical inference when we are repeatedly told that the trigger is in fact actual contradictions and inconsistencies.  A widow putting their oldest disregarded child on the electoral register almost automatically gets falsely suspected of fraud by officials who do not understand the computing output which resulted in them being highlighted as a potential fraud case with all the distress and potential damage that this utterly unfair and secretive process involves.

      On this basis, the underlying logic and reasoning and full justifications of any meta data linked to individual people should be published in respect of all data mining exercises.  One reason for this is that we have incorrectly been told that data matching to prevent fraud works by identifying discrepancies and contradictions that some people now work backwards: if the NFI produces a report then, people reason, there must be an inconsistency and they set about inventing 'facts' to show why there is an inconsistency even though the NFI is some cases is now actively denying it ever thought that the exercises identified actual as opposed to potential inconsistencies.

       

      One example of this is the invention of a single person council tax discount to which only those literally living alone are entitled.  No such thing exists (see Section 11(1) of the Act. On the basis that such a thing does exist, and on the equally false basis that one can check how many people count under CT law (See Section 6 of the Act) as having their sole or main residence as per CT LAW at an address by cross checking to the full electoral register (But see the case law Williams v Horsham District Council, which confirms that this is an unlawful procedure) even the NFA is now asserting that if there is entitlement to a non existent 'SPD' then the electoral register is wrong and somebody needs kicking off it. But see the electoral commission web site entry on for example student voters and generally on registering to vote at more than one of your residences.

       

       

  23. Comment by Anonymous posted on

    For some time I have been particularly concerned about the impact of the Data Protection Act on the public's right to know of health anomalies in their locality. As you may know, there was a request for childhood leukaemia figures in south west Scotland which was appealed by NHS Scotland and finally went up to Supreme Court level, where it was decided that full disclosure could not be allowed at ward level. However, the interpretation of the Data Protection Act for the release of health data seems to have reached absurd proportions, exemplified by my correspondence with NHS Scotland, when I asked for the annual figures for myeloid leukaemia in children under the age of one, to see if there was any significant variation at the time of the Chernobyl accident. These figures were not released to me on the grounds that they were less than five per annum.

    I can see no circumstances in which knowing the number of myeloid leukaemias in the whole of Scotland could possibly aid the identification of individual cases. Of course, mine was a research interest, but the problem must be particularly acute for a family in a particular village whose child has leukaemia and they suspect some environmental source such as a nuclear plant, municipal waste incinerator or 'phone mast. After all, the public demands the right to know if a paedophile lives in their midst, why can’t they have the right to know of a facility that may have detrimental health impacts, so that they can pursue their democratic right to campaign against it?

    If some poor woman gets battered nearly to death in a leafy lane in Surrey, her personal details are spread all over the newspapers, but the existence of a childhood cancer cluster is now hidden from public knowledge. If your child has leukaemia, you have a very strong interest in knowing whether other children in your area also have the same disease.

    There is another danger in the lack of transparency for localised health data, namely if the information is incorrect or altered, no one can check the details. This is particularly important where such information is politically sensitive, for example the occurrence of, say, childhood thyroid cancer cases near a nuclear power plant. In 1983, I carried out the statistical research for the YTV film “Windscale, the Nuclear Laundry” which discovered six cases of childhood leukaemia near the Windscale (Sellafield) nuclear reprocessing plant. Three of the cases were known by the health authority, and three by a local doctor. Three cases by themselves would not have been out of the ordinary but the combination of records leading to six cases was exceptional. However, it took the resources of a large TV company to ferret out the truth. Nowadays, we need a more rational approach.

    Yes, privacy for an individual's health record is important, but public interest in this area remains paramount. My suggestion is that all health information should be published about a locality when the number of cases is two or more. In many instances, this will not be significant, but on occasions it may help to lead to important new epidemiological information without pin-pointing any one individual.

    Best wishes,

    John Urquhart

  24. Comment by Anonymous posted on

    I believe that good principles in terms of statistical reporting need to be added into the mix.

    For example, when the NFI reports on cases of council tax discounts found under the NFI to have been 'incorrectly awarded or claimed' it is using the word 'incorrectly' in an esoteric and legally controversial manner.  By 'incorrectly' it appears to mean that certain administrative procedures which it has been advised a court might on balance find to be legal have not been followed, and this is in a context where the decision on what procedures are reasonable rests with local councils subject to the usual public law considerations, and any disagreement should be carried out via a judicial review.

    The figures are often taken to mean that the amounts, often cited in terms of hundreds of millions, equate to discounts deducted when there was no entitlement. This is very far from being the case.   

    What appears to happen is that the NFI insists that particular 'codes' are attached to some council tax accounts, even though these codes are at odds with the black and white letter of council tax law and especially regulations 15 and 20.  Despite councils being required to deduct the appropriate amount on the assumption that the same rate and amount will apply on every day of the coming year, the NFI insists that councils code accounts according to the information initially supplied at some previous date in time, information about how many adults had their sole or main residence (as per Section 6 of the Local Government Finance Act) at the address at that time.  The NFI expects that every time a new voter appears on the electoral register, the council suspects that that voter may a) have their sole or main residence at the address AND b) not fall to be disregarded OR that the new voter is not a valid voter.  On this basis it issues hit lists for councils to investigate.  If the new voter is resident and is disregarded ie there was full entitlement the NFI expects its 'code' to be altered and on this basis it asserts that there was some incorrectness in either the administrative processes of the council.  Throwing the word 'claimed' in muddies the water as it was never the legal position of the taxpayer that they were at the time the data was supplied to the NFI or subjected to data mining by the NFI 'claiming' to live alone for council tax purposes.

    In fact the NFI has no idea at all of how many abortive investigations follow from its supply of 'intelligence' to councils that people 'might not be entitled' to their discount saying, bluntly, that it is not interested in how many people were entitled to the discount. It clearly includes within its figures for 'incorrectly awarded or claimed' discounts very large numbers of cases where there was all the time entitlement.

    Worse of all, the National Fraud Authority is now publishing guidance asserting in the face of the account of the law published on the web site of the Electoral Commission that where there is entitlement to a discount the second voter should be removed from the electoral register.  Therefore, unsupported statistical and verbal data and data mining is being used to threaten the legal rights of people to register to vote at a valid address of their choice, with students choosing to register at both their former parental home and their university address being repeatedly implicated as involved in non existent council tax discount fraud.