Some links I thought worth reading the past few days

To celebrate the launch of the GDPR last week Friday, Jaap-Henk Hoekman released his ‘little blue book’ (pdf)’ on Privacy Design Strategies (with a CC-BY-NC license). Hoekman is an associate professor with the Digital Security group of the ICS department at the Radboud University.

I heard him speak a few months ago at a Tech Solidarity meet-up, and enjoyed his insights and pragmatic approaches (PDF slides here).

Data protection by design (together with a ‘state of the art’ requirement) forms the forward looking part of the GDPR where the minimum requirements are always evolving. The GDPR is designed to have a rising floor that way.
The little blue book has an easy to understand outline, which cuts up doing privacy by design into 8 strategies, each accompanied by a number of tactics, that can all be used in parallel.

Those 8 strategies (shown in the image above) are divided into 2 groups, data oriented strategies and process oriented strategies.

Data oriented strategies:
Minimise (tactics: Select, Exclude, Strip, Destroy)
Separate (tactics: Isolate, Distribute)
Abstract (tactics: Summarise, Group, Perturb)
Hide (tactics: Restrict, Obfuscate, Dissociate, Mix)

Process oriented strategies:
Inform (tactics: Supply, Explain, Notify)
Control (tactics: Consent, Choose, Update, Retract)
Enforce (tactics: Create, Maintain, Uphold)
Demonstrate (tactics: Record, Audit, Report)

All come with examples and the final chapters provide suggestions how to apply them in an organisation.

Today I was at a session at the Ministry for Interior Affairs in The Hague on the GDPR, organised by the center of expertise on open government.
It made me realise how I actually approach the GDPR, and how I see all the overblown reactions to it, like sending all of us a heap of mail to re-request consent where none’s needed, or taking your website or personal blog even offline. I find I approach the GDPR like I approach a quality assurance (QA) system.

One key change with the GDPR is that organisations can now be audited concerning their preventive data protection measures, which of course already mimics QA. (Next to that the GDPR is mostly an incremental change to the previous law, except for the people described by your data having articulated rights that apply globally, and having a new set of teeth in the form of substantial penalties.)

AVG mindmap
My colleague Paul facilitated the session and showed this mindmap of GDPR aspects. I think it misses the more future oriented parts.

The session today had three brief presentations.

In one a student showed some results from his thesis research on the implementation of the GDPR, in which he had spoken with a lot of data protection officers or DPO’s. These are mandatory roles for all public sector bodies, and also mandatory for some specific types of data processing companies. One of the surprising outcomes is that some of these DPO’s saw themselves, and were seen as, ‘outposts’ of the data protection authority, in other words seen as enforcers or even potentially as moles. This is not conducive to a DPO fulfilling the part of its role in raising awareness of and sensitivity to data protection issues. This strongly reminded me of when 20 years ago I was involved in creating a QA system from scratch for my then employer. Some of my colleagues saw the role of the quality assurance manager as policing their work. It took effort to show how we were not building a straightjacket around them that kept them within strict boundaries, but providing a solid skeleton to grow on, and move faster. Where audits are not hunts for breaches of compliance but a way to make emergent changes in the way people worked visible, and incorporate professionally justified ones in that skeleton.

In another presentation a civil servant of the Ministry involved in creating a register of all person related data being processed. What stood out most for me was the (rightly) pragmatic approach they took with describing current practices and data collections inside the organisation. This is a key element of QA as well. You work from descriptions of what happens, and not at what ’should’ happen or ‘ideally’ happens. QA is a practice rooted in pragmatism, where once that practice is described and agreed it will be audited.
Of course in the case of the Ministry it helps that they only have tasks mandated by law, and therefore the grounds for processing are clear by default, and if not the data should not be collected. This reduces the range of potential grey areas. Similarly for security measures, they already need to adhere to national security guidelines (called the national baseline information security), which likewise helps with avoiding new measures, proves compliance for them, and provides an auditable security requirement to go with it. This no doubt helped them to be able to take that pragmatic approach. Pragmatism is at the core of QA as well, it takes its cues from what is really happening in the organisation, what the professionals are really doing.

A third one dealt with open standards for both processes and technologies by the national Forum for Standardisation. Since 2008 a growing list of currently some 40 or so standards is mandatory for Dutch public sector bodies. In this list of standards you find a range of elements that are ready made to help with GDPR compliance. In terms of support for the rights of those described by the data, such as the right to export and portability for instance, or in terms of preventive technological security measures, and ‘by design’ data protection measures. Some of these are ISO norms themselves, or, as the mentioned national baseline information security, a compliant derivative of such ISO norms.

These elements, the ‘police’ vs ‘counsel’ perspective on the rol of a DPO, the pragmatism that needs to underpin actions, and the building blocks readily to be found elsewhere in your own practice already based on QA principles, made me realise and better articulate how I’ve been viewing the GDPR all along. As a quality assurance system for data protection.

With a quality assurance system you can still famously produce concrete swimming vests, but it will be at least done consistently. Likewise with GDPR you will still be able to do all kinds of things with data. Big Data and developing machine learning systems are hard but hopefully worthwile to do. With GDPR it will just be hard in a slightly different way, but it will also be helped by establishing some baselines and testing core assumptions. While making your purposes and ways of working available for scrutiny. Introducing QA upon its introduction does not change the way an organisation works, unless it really doesn’t have its house in order. Likewise the GDPR won’t change your organisation much if you have your house in order either.

From the QA perspective on GDPR, it is perfectly clear why it has a moving baseline (through its ‘by design’ and ‘state of the art’ requirements). From the QA perspective on GDPR it is perfectly clear what the connection is to how Europe is positioning itself geopolitically in the race concerning AI. The policing perspective after all only leads to a luddite stance concerning AI, which is not what the EU is doing, far from it. From that it is clear how the legislator intends the thrust of GDPR. As QA really.

Ethics by design is adding ethical choices and values to a design process as non-functional requirements, that then are turned into functional specifications.

E.g. when you want to count the size of a group of people by taking a picture of them, adding the value of safeguarding privacy into the requirements might mean the picture will be intentionally made grainy by a camera. A more grainy pic still allows you to count the number of people in the photo, but you never captured and stored their actual faces.

When it comes to data governance and machine learning Europe’s stance towards safeguarding civic rights and enlightenment values is a unique perspective to take in a geopolitical context. Data is a very valuable resource. In the US large corporations and intelligence services have created enormous data lakes, without much restraints, resulting in a tremendous power asymmetry, and an objectification of the individual. This is surveillance capitalism.
China, and others like Russia, have created or are creating large national data spaces in which the individual is made fully transparent and described by connecting most if not all data sources and make them accessible to government, and where resulting data patterns have direct consequences for citizens. This is data driven authoritarian rule.
Europe cannot compete with either of those two models, but can provide a competing perspective on data usage by creating a path of responsible innovation in which all data is as much combined and connected as elsewhere in the world, yet with values and ethical boundaries designed into its core. With the GDPR the EU is already setting a new de-facto global standard, and doing more along similar lines, not just in terms of regulations, but also in terms of infrastructure (Estonia’s X-road for instance) is the opportunity Europe has.

Some pointers:
My blogpost Ethics by Design
A paper (PDF) on Value Sensitive Design
The French report For a Meaningful Artificial Intelligence (PDF), that drive France’s 1.5 billion investment in value based AI.

Last week I had the pleasure to attend and to speak at the annual FOSS4G conference. This gathering of the community around free and open source software in the geo-sector took place in Bonn, in what used to be the German parliament. I’ve posted the outline, slides and video of my keynote already at my company’s website, but am now also crossposting it here.

Speaking in the former German Parliament
Speaking in the former plenary room of the German Parliament. Photo by Bart van den Eijnden

In my talk I outlined that it is often hard to see the real impact of open data, and explored the reasons why. I ended with a call upon the FOSS4G community to be an active force in driving ethics by design in re-using data.

Impact is often hard to see, because measurement takes effort
Firstly, because it takes a lot of effort to map out all the network effects, for instance when doing micro-economic studies like we did for ESA or when you need to look for many small and varied impacts, both socially and economically. This is especially true if you take a ‘publish and it will happen’ approach. Spotting impact becomes much easier if you already know what type of impact you actually want to achieve and then publish data sets you think may enable other stakeholders to create such impact. Around real issues, in real contexts, it is much easier to spot real impact of publishing and re-using open data. It does require that the published data is serious, as serious as the issues. It also requires openness: that is what brings new stakeholders into play, and creates new perspectives towards agency so that impact results. Openness needs to be vigorously defended because of it. And the FOSS4G community is well suited to do that, as openness is part of their value set.

Impact is often hard to see, because of fragmentation in availability
Secondly, because impact often results from combinations of data sets, and the current reality is that data provision is mostly much too fragmented to allow interesting combinations. Some of the specific data sets, or the right timeframe or geographic scope might be missing, making interesting re-uses impossible.
Emerging national data infrastructures, such as the Danish and the Dutch have been creating, are a good fix for this. They combine several core government data sets into a system and open it up as much as possible. Think of cadastral records, maps, persons, companies, adresses and buildings.
Geo data is at the heart of all this (maps, addresses, buildings, plots, objects), and it turns it into the linking pin for many re-uses where otherwise diverse data sets are combined.

Geo is the linking pin, and its role is shifting: ethics by design needed
Because of geo-data being the linking pin, the role of geo-data is shifting. First of all it puts geo-data in the very heart of every privacy discussion around open data. Combinations of data sets quickly can become privacy issues, with geo-data being the combinator. Privacy and other ethical questions arise even more now that geo-data is no longer about relatively static maps, but where sensors are making many more objects as well as human beings objects on the map in real time.
At the same time geo-data is becoming less visible in these combinations. ‘The map’ is not neccessarily a significant part of the result of combining data sets, just a catalyst on the way to get there. Will geo-data be a neutral ingredient, or will it be an ingredient with a strong attitude? An attitude that aims to actively promulgate ethical choices, not just concerning privacy, but also concerning what are statistically responsible combinations, and what are and are not legal steps in getting to an in itself legal result again? As with defending openness itself, the FOSS4G community is in a good position to push the ethical questions forward in the geo community as well as find ways of incorporating them directly in the tools they build and use.

The video of the keynote has been published by the FOSS4G conference organisers.
Slides are available from Slideshare and embedded below: