There were several points made in the conversation after my presentation yesterday at Open Belgium 2019. This is a brief overview to capture them here.

1) One remark was about the balance between privacy and openness, and asking about (negative) privacy impacts.

The framework assumes government as the party being interested in measurement (given that that was the assignment for which it was created). Government held open data is by default not personal data as re-use rules are based on access regimes which in turn all exclude personal data (with a few separately regulated exceptions). What I took away from the remark is that, as we know new privacy and other ethical issues may arise from working with data combinations, it might be of interest if we can formulate indicators that try to track negative outcomes or spot unintended consequences, in the same way as we are trying to track positive signals.

2) One question was about if I had included all economic modelling work in academia etc.

I didn’t. This isn’t academic research either. It seeks to apply lessons already learned. What was included were existing documented cases, studies and research papers looking at various aspects of open data impact. Some of those are academic publications, some aren’t. What I took from those studies is two things: what exactly did they look at (and what did they find), and how did they assess a specific impact? The ‘what’ was used as potential indicator, the ‘how’ as the method. It is of interest to keep tracking new research as it gets published, to augment the framework.

3) Is this academic research?

No, its primary aim is as a practical instrument for data holders as well as national open data policy makers. It’s is not meant to establish scientific truth, and completely quantify impact once and for all. It’s meant to establish if there are signs the right steps are taken, and if that results in visible impact. The aim, and this connects to the previous question as well, is to avoid extensive modelling techniques, and favor indicators we know work, where the methods are straightforward. This to ensure that government data holders are capable to do these measurements themselves, and use it actively as an instrument.

4) Does it include citizen science (open data) efforts?

This is an interesting one (asked by Lukas of Luftdaten.info). The framework currently does include in a way the existence and emergence of citizen science projects, as that would come up in any stakeholder mapping attempts and in any emerging ecosystem tracking, and as examples of using government open data (as context and background for citizen science measurements). But the framework doesn’t look at the impact of such efforts, not in terms of socio-economic impact and not in terms of government being a potential user of citizen science data. Again the framework is to make visible the impact of government opening up data. But I think it’s not very difficult to adapt the framework to track citizen science project’s impact. Adding citizen science projects in a more direct way, as indicators for the framework itself is harder I think, as it needs more clarification of how it ties into the impact of open government data.

5) Is this based only on papers, or also on approaching groups, and people ‘feeling’ the impact?

This was connected to the citizen science bit. Yes, the framework is based on existing documented material only. And although a range of those base themselves on interviewing or surveying various stakeholders, that is not a default or deliberate part of how the framework was created. I do however recognise the value of for instance participatory narrative inquiry that makes the real experiences of people visible, and the patterns across those experiences. Including that sort of measurements would be useful especially on the social and societal impacts of open data. But currently none of the studies that were re-used in the framework took that approach. It does make me think about how one could set-up something like that to monitor impact e.g. of local government open data initiatives.

At Open Belgium 2019 today Daniel Leufer gave an interesting session on bringing philosophy and technology closer together. He presented the Open Philosophy Network, as an attempt to bring philosophy questions into tech discussions while preventing a) the overly abstract work going on in academia, b) not having all stakeholders at the table in an equal setting. He aims at local gatherings and events. Such as a book reading group, on Shoshana Zuboff’s The Age of Surveillance Capitalism. Or tech-ethics round table discussions where there isn’t a panel of experts that gets interviewed but where philosophers, technologists and people who use the technology are all part of the discussion.

This resonated with me at various levels. One level is that I recognise a strong interest in naive explorations of ethical questions around technology. For instance at our Smart Stuff That Matters unconference last summer, in various conversations ethical discussions emerged naturally from the actual context of the session and the event.
Another is that, unlike some of the academic efforts I know, the step towards practical applicability is expected and needed sooner by many. In the end it all has to inform actions and choices in the here and now, even when nobody expects definitive answers. It is also why I myself dislike how many ethical discussions pretending to be action oriented are primarily connected to future or emergent technologies, not to current technology choices. Then it’s just a fig leaf for inaction, and removing agency. I’m more a pragmatist, and am interested in what achieves actual improvements in the here and now, and what increases agency.
Thirdly I also felt that there are many more connections to make in terms of open session formats, such as Open Space, knowledge cafés, blogwalks, and barcamps, and indeed the living room experience of our birthday unconferences. I’ve organised many of those, and I feel the need to revisit those experiences and think about how to deploy them for something like this.This also applies to formulating a slightly more structured approach to assist groups in organisations with naive ethical explorations.

The point of ethics is not to provide definitive answers, but to prevent us using terrible answers

I hope to interact a bit more with Daniel Leufer in the near future.

Today I gave a brief presentation of the framework for measuring open data impact I created for UNDP Serbia last year, at the Open Belgium 2019 Conference.

The framework is meant to be relatable and usable for individual organisations by themselves, and based on how existing cases, papers and research in the past have tried to establish such impact.

Here are the slides.

This is the full transcript of my presentation:

Last Friday, when Pieter Colpaert tweeted the talks he intended to visit (Hi Pieter!), he said two things. First he said after the coffee it starts to get difficult, and that’s true. Measuring impact is a difficult topic. And he asked about measuring impact: How can you possibly do that? He’s right to be cautious.

Because our everyday perception of impact and how to detect it is often too simplistic. Where’s the next Google the EC asked years ago. but it’s the wrong question. We will only know in 20 years when it is the new tech giant. But today it is likely a small start-up of four people with laptops and one idea, in Lithuania or Bulgaria somewhere, and we are by definition not be able to recognize it, framed this way. Asking for the killer app for open data is a similarly wrong question.

When it comes to impact, we seem to want one straightforward big thing. Hundreds of billions of euro impact in the EU as a whole, made up of a handful of wildly successful things. But what does that actually mean for you, a local government? And while you’re looking for that big impact you are missing all the smaller craters in this same picture, and also the bigger ones if they don’t translate easily into money.

Over the years however, there have been a range of studies, cases and research papers documenting specific impacts and effects. Me and my colleagues started collecting those a long time ago. And I used them to help contextualise potential impacts. First for the Flemish government, and last year for the Serbian government. To show what observed impact in for instance a Spanish sector would mean in the corresponding Belgian context. How a global prediction correlates to the Serbian economy and government strategies.

The UNDP in Serbia, asked me to extend that with a proposal for indicators to measure impact as they move forward with new open data action plans in follow up of the national readiness assessment I did for them earlier. I took the existing studies and looked at what they had tried to measure, what the common patterns are, and what they had looked at precisely. I turned that into a framework for impact measurement.

In the following minutes I will address three things. First what makes measuring impact so hard. Second what the common patterns are across existing research. Third how, avoiding the pitfalls, and using the commonalities we can build a framework, that then in itself is an indicator.Let’s first talk about the things that make measuring impact hard.

Judging by the available studies and cases there are several issues that make any easy answers to the question of open data impact impossible.There are a range of reasons measurement is hard. I’ll highlight a few.
Number 3, context is key. If you don’t know what you’re looking at, or why, no measurement makes much sense. And you can only know that in specific contexts. But specifying contexts takes effort. It asks the question: Where do you WANT impact.

Another issue is showing the impact of many small increments. Like how every Dutch person looks at this most used open data app every morning, the rain radar. How often has it changed a decision from taking the car to taking a bike? What does it mean in terms of congestion reduction, or emission reduction? Can you meaningfully quantify that at all?

Also important is who is asking for measurement. In one of my first jobs, my employer didn’t have email for all yet, so I asked for it. In response the MD asked me to put together the business case for email. This is a classic response when you don’t want to change anything. Often asking for measurement is meant to block change. Because they know you cannot predict the future. Motives shape measurements. The contextualisation of impact elsewhere to Flanders and Serbia in part took place because of this. Use existing answers against such a tactic.

Maturity and completeness of both the provision side, government, as well as the demand side, re-users, determine in equal measures what is possible at all, in terms of open data impact. If there is no mature provision side, in the end nothing will happen. If provision is perfect but demand side isn’t mature, it still doesn’t matter. Impact demands similar levels of maturity on both sides. It demands acknowledging interdependencies. And where that maturity is lacking, tracking impact means looking at different sets of indicators.

Measurements often motivate people to game the system. Especially single measurements. When number of datasets was still a metric for national portals the French opened with over 350k datasets. But really it was just a few dozen, which they had split according to departments and municipalities. So a balance is needed, with multiple indicators that point in different directions.

Open data, especially open core government registers, can be seen as infrastructure. But we actually don’t know how infrastructure creates impact. We know that building roads usually has a certain impact (investment correlates to a certain % rise in GDP), but we don’t know how it does so. Seeing open data as infrastructure is a logical approach (the consensus seems that the potential impact is about 2% of GDP), but it doesn’t help us much to measure impact or see how it creates that.

Network effects exist, but they are very costly to track. First order, second order, third order, higher order effects. We’re doing case studies for ESA on how satellite data gets used. We can establish network effects for instance how ice breakers in the Botnian gulf use satellite data in ways that ultimately reduce super market prices, but doing 24 such cases is a multi year effort.

E puor si muove! Galileo said Yet still it moves. The same is true for open data. Most measurements are proxies. They show something moving, without necessarily showing the thing that is doing the moving. Open data often is a silent actor, or a long range one. Yet still it moves.

Yet still it moves. And if we look at the patterns of established studies, that is what we indeed see. There are communalities in what movement we see. In the list on the slide the last point, that open data is a policy instrument is key. We know publishing data enables other stakeholders to act. When you do that on purpose you turn open data into a policy instrument. The cheapest one you have next to regulation and financing.

We all know the story of the drunk that lost his keys. He was searching under the light of a street lamp. Someone who helped him else asked if he lost the keys there. No, the drunk said, but at least there is light here. The same is true for open data. If you know what you published it for, at least you will be able to recognise relevant impact, if not all the impact it creates. Using it as policy instrument is like switching on the lights.

Dealing with lack of maturity means having different indicators for every step of the way. Not just seeing if impact occurs, but also if the right things are being done to make impact possible: Lead and lag indicators

The framework then is built from what has been used to establish impact in the past, and what we see in our projects as useful approaches. The point here is that we are not overly simplifying measurement, but adapt it to whatever is the context of a data provider or user. Also there’s never just one measurement, so a balanced approach is possible. You can’t game the system. It covers various levels of maturity from your first open dataset all the way to network effects. And you see that indicators that by themselves are too simple, still can be used.

Additionally the framework itself is a large scale sensor. If one indicator moves, you should see movement in other indicators over time as well. If you throw a stone in the pond, you should see ripples propagate. This means that if you start with data provision indicators only, you should see other measurements in other phases pick up. This allows you to both use a set of indicators across all phases, as well as move to more progressive ones when you outgrow the initial ones.finally some recommendations.

Some final thoughts. If you publish by default as integral part of processes, measuring impact, or building a business case is not needed as such. But measurement is very helpful in the transition to that end game. Core data and core policy elements, and their stakeholders are key. Measurement needs to be designed up front. Using open data as policy instrument lets you define the impact you are looking for at the least. The framework is the measurement: Only micro-economic studies really establish specific economic impact, but they only work in mature situations and cost a lot of effort, so you need to know when you are ready for them. But measurement can start wherever you are, with indicators that reflect the overall open data maturity level you are at, while looking both back and forwards. And because measurement can be done, as a data holder you should be doing it.

This is the presentation I gave at the Open Belgium 2018 Conference in Louvain-la-Neuve this week, titled ‘The role and value of data inventories, a key step towards mature data governance’. The slides are embedded further below, and as PDF download at grnl.eu/in. It’s a long read (some 3000 words), so I’ll start with a summary.

Summary, TL;DR

The quality of information households in local governments is often lacking.
Things like security, openness and privacy are safeguarded by putting separate fences for each around the organisation, but those safeguards lack having detailed insight into data structures and effective corresponding processes. As archiving, security, openness and privacy in a digitised environment are basically inseparable, doing ‘everything by design’ is the only option. The only effective way is doing everything at the level of the data itself. Fences are inefficient, ineffective, and the GDPR due to its obligations will show how the privacy fence fails, forcing organisations to act. Only doing data governance for privacy is senseless, doing it also for openness, security and archiving at the same time is logical. Having good detailed inventories of your data holdings is a useful instrument to start asking the hard questions, and have meaningful conversations. It additionally allows local government to deploy open or shared data as policy instrument, and releasing the inventory itself will help articulate civic demand for data. We’ve done a range of these inventories with local government.

1: High time for mature data governance in local and regional government

Hight time! (clock in Louvain-la-Neuve)Digitisation changes how we look at things like openness, privacy, security and archiving, as it creates new affordances now that the content and its medium have become decoupled. It creates new forms of usage, and new needs to manage those. As a result of that e.g. archivists find they now need to be involved at the very start of digital information processes, whereas earlier their work would basically start when the boxes of papers were delivered to them.

The reality is that local and regional governments have barely begun to fully embrace and leverage the affordances that digitisation provides them with. It shows in how most of them deal with information security, openness and privacy: by building three fences.

Security is mostly interpreted as keeping other people out, so a fence is put between the organisation and the outside world. Inside it nothing much is changed. Similarly a second fence is put in place for determining openness. What is open can reach the outside world, and the fence is there to do the filtering. Finally privacy is also dealt with by a fence, either around the entire organisation or a specific system, keeping unwanted eyes out. All fences are a barrier between outside and in, and within the organisation usually no further measures are taken. All three fences exist separately from each other, as stand alone fixes for their singular purpose.

The first fence: security
In the Netherlands for local governments a ‘baseline information security’ standard applies, and it determines what information should be regarded as business critical. Something is business critical if its downtime will stop public service delivery, or of its lack of quality has immediate negative consequences for decision making (e.g. decisions on benefits impacting citizens). Uptime and downtime are mostly about IT infrastructure, dependencies and service level agreements, and those fit the fence tactic quite well. Quality in the context of security is about ensuring data is tamper free, doing audits, input checks, and knowing sources. That requires a data-centric approach, and it doesn’t fit the fence-around-the-organisation tactic.


The second fence: openness
Openness of local government information is mostly at request, or at best as a process separate from regular operational routines. Yet the stated end game is that everything should be actively open by design, meaning everything that can be made public will be published the moment it is publishable. We also see that open data is becoming infrastructure in some domains. The implementation of the digitisation of the law on public spaces, requires all involved stakeholders to have the same (access to) information. Many public sector bodies, both local ones and central ones like the cadastral office, have concluded that doing that through open data is the most viable way. For both the desired end game and using open data as infrastructure the fence tactic is however very inefficient.
At the same time the data sovereignty of local governments is under threat. They increasingly collaborate in networks or outsource part of their processes. In most contracts there is no attention paid to data, other than in generic terms in the general procurement conditions. We’ve come across a variety of examples where this results 1) in governments not being able to provide data to citizens, even though by law they should be able to 2) governments not being able to access their own data, only resulting graphs and reports, or 3) the slowest partner in a network determining the speed of disclosure. In short, the fence tactic is also ineffective. A more data-centric approach is needed.

The third fence: personal data protection
Mostly privacy is being dealt with by identifying privacy sensitive material (but not what, where and when), and locking it down by putting up the third fence. The new EU privacy regulations GDPR, which will be enforced from May this year, is seen as a source of uncertainty by local governments. It is also responded to in the accustomed way: reinforcing the fence, by making a ‘better’ list of what personal data is used within the organisation but still not paying much attention to processes, nor the shape and form of the personal data.
However in the case of the GDPR, if it indeed will be really enforced, this will not be enough.

GDPR an opportunity for ‘everything by design’
The GDPR confers rights to the people described by data, like the right to review, to portability, and to be forgotten. It also demands compliance is done ‘by design’, and ‘state of the art’. This can only be done by design if you are able to turn the rights of the GDPR into queries on your data, and have (automated) processes in place to deal with requests. It cannot be done with a ‘better’ fence. In the case of the GDPR, the first data related law that takes the affordances of digitisation as a given, the fence tactic is set to fail spectacularly. This makes the GDPR a great opportunity to move to a data focus not just for privacy by design, but to do openness, archiving and information security (in terms of quality) by design at the same time, as they are converging aspects of the same thing and can no longer be meaningfully separated. Detailed knowledge about your data structures then is needed.

Local governments inadvertently admit fence-tactic is failing
Governments already clearly yet indirectly admit that the fences don’t really work as tactic.
Local governments have been loudly complaining for years about the feared costs of compliance, concerning both openness and privacy. Drilling down into those complaints reveals that the feared costs concern the time and effort involved in e.g. dealing with requests. Because there’s only a fence, and usually no processes or detailed knowledge of the data they hold, every request becomes an expedition for answers. If local governments had detailed insight in the data structures, data content, and systems in use, the cost of compliance would be zero or at least indistinguishable from the rest of operations. Dealing with a request would be nothing more than running a query against their systems.

Complaints about compliance costs are essentially an admission that governments do not have their house in order when it comes to data.
The interviews I did with various stakeholders as part of the evaluation of the PSI Directive confirm this: the biggest obstacle stakeholders perceive to being more open and to realising impact with open data is the low quality of information systems and processes. It blocks fully leveraging the affordances digitisation brings.

Towards mature data governance, by making inventory
Changing tactics, doing away with the three fences, and focusing on having detailed knowledge of their data is needed. Combining what now are separate and disconnected activities (information security, openness, archiving and personal data protection), into ‘everything by design’. Basically it means turning all you know about your data into metadata that becomes part of your data. So that it will be easy to see which parts of a specific data set contain what type of person related data, which data fields are public, which subset is business critical, the records that have third party rights attached, or which records need to be deleted after a specific amount of time. Don’t man the fences where every check is always extra work, but let the data be able to tell exactly what is or is(n’t) possible, allowed, meant or needed. Getting there starts with making an inventory of what data a local or regional government currently holds, and describing the data in detailed operational, legal and technological terms.

Mature digital data governance: all aspects about the data are part of the data, allowing all processes and decisions access to all relevant material in determining what’s possible.

2: Ways local government data inventories are useful

Inventories are a key first step in doing away with the ineffective fences and towards mature data governance. Inventories are also useful as an instrument for several other purposes.

Local is where you are, but not the data pro’s
There’s a clear reason why local governments don’t have their house in order when it comes to data.
Most of our lives are local. The streets we live on, the shopping center we frequent, the schools we attend, the spaces we park in, the quality of life in our neighbourhood, the parks we walk our dogs in, the public transport we use for our commutes. All those acts are local.
Local governments have a wide variety of tasks, reflecting the variety of our acts. They hold a corresponding variety of data, connected to all those different tasks. Yet local governments are not data professionals. Unlike singular-task, data heavy national government bodies, like the Cadastre, the Meteo institute or the department for motor vehicles, local governments usually don’t have the capacity or capability. As a result local governments mostly don’t know their own data, and don’t have established effective processes that build on that data knowledge. Inventories are a first step. Inventories point to where contracts, procurement and collaboration leads to loss of needed data sovereignty. Inventories also allow determining what, from a technology perspective, is a smooth transition path to the actively open by design end-game local governments envision.

Open data as a policy instrument
Where local governments want to use the data they have as a way to enable others to act differently or in support of policy goals, they need to know in detail which data they hold and what can be done with it. Using open data as policy instrument means creating new connections between stakeholders around a policy issue, by putting the data into play. To be able to see which data could be published to engage certain stakeholders it takes knowing what you have, what it contains, and in which shape you have it first.

Better articulated citizen demands for data
Making public a list of what you have is also important here, as it invites new demand for your data. It allows people to be aware of what data exists, and contemplate if they have a use case for it. If a data set hasn’t been published yet, its existence is discoverable, so they can request it. It also enables local government to extend the data they publish based on actual demand, not assumed demand or blindly. This increases the likelihood data will be used, and increases the socio-economic impact.

Emerging data
More and more new data is emerging, from sensor networks in public and private spaces. This way new stakeholders and citizens are becoming agents in the public space, where they meet up with local governments. New relationships, and new choices result. For instance the sensor in my garden measuring temperature and humidity is part of the citizen-initiated Measure your city network, but also an element in the local governments climate change adaptation policies. For local governments as regulators, as guardian of public space, as data collector, and as source of transparency, this is a rebalancing of their position. It again takes knowing what data you own and how it relates to and complements what others collect and own. Only then is a local government able to weave a network with those stakeholders that connects data into valuable agency for all involved. (We’ve built a guidance tool, in Dutch, for the role of local government with regard to sensors in public spaces)

Having detailed data inventories are a way to start having the right conversations for local governments on all these points.

3: Getting to inventories

To create useful and detailed inventories, as I and my colleagues did for half a dozen local governments, some elements are key in my view. We looked at structured data collections only, so disregarded the thousands of individual once-off spreadsheets. They are not irrelevant, but obscure the wood for the trees. Then we scored all those data sets on up to 80(!) different facets, concerning policy domain, internal usage, current availability, technical details, legal aspects, and concerns etc. A key element in doing that is not making any assumptions:

  • don’t assume your list of applications will tell you what data you have. Not all your listed apps will be used, others won’t be on the list, and none of it tells you in detail what data actually is processed in them, just a generic pointer
  • don’t assume information management knows it all, as shadow information processes will exist outside of their view
  • don’t assume people know when you ask them how they do their work, as their description and rationalisation of their acts will not match up with reality,
    let them also show you
  • don’t assume people know the details of the data they work with, sit down with them and look at it together
  • don’t assume what it says on the tin is correct, as you’ll find things that don’t belong there (we’ve e.g. found domestic abuse data in a data set on litter in public spaces)

Doing an inventory well means

  • diving deeply into which applications are actually used,
  • talking to every unit in the organisation about their actual work and seeing it being done,
  • looking closely at data structures and real data content,
  • looking closely at current metadata and its quality
  • separately looking at large projects and programs as they tend to have their own information systems,
  • going through external communications as it may refer to internally held data not listed elsewhere,
  • looking at (procurement and collaboration) contracts to determine what claims other might have on data,
  • and then cross-referencing it all, and bringing it together in one giant list, scored on up to 80 facets.

Another essential part, especially to ensure the resulting inventory will be used as an instrument, is from the start ensuring the involvement and buy-in of the various parts of local government that usually are islands (IT, IM, legal, policy departments, archivists, domain experts, data experts). So that the inventory is something used to ask a variety of detailed questions of.

bring the islands together
Bring the islands together. (photo Dmitry Teslya CC-BY

We’ve followed various paths to do inventories, sometimes on our own as external team, sometimes in close cooperation with a client team, sometimes a guide for a client team while their operational colleagues do the actual work. All three yield very useful results but there’s a balance to strike between consistency and accuracy, the amount of feasible buy-in, and the way the hand-over is planned, so that the inventory becomes an instrument in future data-discussions.

What comes out as raw numbers is itself often counter-intuitive to local government. Some 98% of data typically held by Dutch Provinces can be public, although usually some 20% is made public (15% open data, usually geo-data). At local level the numbers are a bit different, as local governments hold much more person related data (concerning social benefits for instance, chronic care, and the persons register). About 67% of local data could be public, but only some 5% usually is. This means there’s still a huge gap between what can be open, and what is actually open. That gap is basically invisible if a local government deploys the three fences, and as a consequence they run on assumptions and overestimate the amount that needs the heaviest protection. The gap becomes visible from looking in-depth at data on all pertinent aspects by doing an inventory.

(Interested in doing an inventory of the data your organisations holds? Do get in touch.)

Yesterday me and all my The Green Land colleagues were at the Open Belgium annual conference (of which we were a sponsor) in Louvain-la-Neuve. We made this group picture, together with our cyberman (an artwork normally at home in our office). Seeing us all together made me realise we are with 8 of us at the moment. I am surprised by the size of our current team.

On 12 March the 2018 edition of the Open Belgium Conference takes place in Louvain-la-Neuve. With my company, next to sponsoring the event as a partner, we submitted several proposals in the open call for speakers. The program and speakers have now been announced. I’m pleased that we’ve been invited to give two presentations.

My colleagues Frank and Jochem will talk about a project we’re doing with a regional government and local governments, where together with civil servants from the local governments we talked to farmers, citizens, entrepreneurs and businesses and simply asked them: ‘what do you do’ and ‘how can we help (with our data)’? The process and results and the way this is a novel experience for both civil service and external stakeholders are story worth sharing.

I will be presenting at the very end of the day, talking about the need for and use of creating systematic and detailed inventories of data assets in a government entity. Increasingly open data, personal data protection, information security, and data sovereignty are overlapping topics and efforts, where most government organisations will still treat them as islands. My and my company’s experience from creating data inventories for 6 different Dutch government bodies shows how data inventories can support data governance, embracing privacy, security and openness, all by design.

I’m looking forward to the conference, and meeting up with both familiar faces, and new ones, as well as get a better overview of all that is happening in Belgium concerning open knowledge. If possible I’d like to find some new contacts for collaboration in Belgium, by transplanting some of our methods and processes.