Today I gave a brief presentation of the framework for measuring open data impact I created for UNDP Serbia last year, at the Open Belgium 2019 Conference.

The framework is meant to be relatable and usable for individual organisations by themselves, and based on how existing cases, papers and research in the past have tried to establish such impact.

Here are the slides.

Measuring open data impact by Ton Zijlstra, at Open Belgium 2019

This is the full transcript of my presentation:

Last Friday, when Pieter Colpaert tweeted the talks he intended to visit (Hi Pieter!), he said two things. First he said after the coffee it starts to get difficult, and that’s true. Measuring impact is a difficult topic. And he asked about measuring impact: How can you possibly do that? He’s right to be cautious.

Because our everyday perception of impact and how to detect it is often too simplistic. Where’s the next Google the EC asked years ago. but it’s the wrong question. We will only know in 20 years when it is the new tech giant. But today it is likely a small start-up of four people with laptops and one idea, in Lithuania or Bulgaria somewhere, and we are by definition not be able to recognize it, framed this way. Asking for the killer app for open data is a similarly wrong question.

When it comes to impact, we seem to want one straightforward big thing. Hundreds of billions of euro impact in the EU as a whole, made up of a handful of wildly successful things. But what does that actually mean for you, a local government? And while you’re looking for that big impact you are missing all the smaller craters in this same picture, and also the bigger ones if they don’t translate easily into money.

Over the years however, there have been a range of studies, cases and research papers documenting specific impacts and effects. Me and my colleagues started collecting those a long time ago. And I used them to help contextualise potential impacts. First for the Flemish government, and last year for the Serbian government. To show what observed impact in for instance a Spanish sector would mean in the corresponding Belgian context. How a global prediction correlates to the Serbian economy and government strategies.

The UNDP in Serbia, asked me to extend that with a proposal for indicators to measure impact as they move forward with new open data action plans in follow up of the national readiness assessment I did for them earlier. I took the existing studies and looked at what they had tried to measure, what the common patterns are, and what they had looked at precisely. I turned that into a framework for impact measurement.

In the following minutes I will address three things. First what makes measuring impact so hard. Second what the common patterns are across existing research. Third how, avoiding the pitfalls, and using the commonalities we can build a framework, that then in itself is an indicator.Let’s first talk about the things that make measuring impact hard.

Judging by the available studies and cases there are several issues that make any easy answers to the question of open data impact impossible.There are a range of reasons measurement is hard. I’ll highlight a few.
Number 3, context is key. If you don’t know what you’re looking at, or why, no measurement makes much sense. And you can only know that in specific contexts. But specifying contexts takes effort. It asks the question: Where do you WANT impact.

Another issue is showing the impact of many small increments. Like how every Dutch person looks at this most used open data app every morning, the rain radar. How often has it changed a decision from taking the car to taking a bike? What does it mean in terms of congestion reduction, or emission reduction? Can you meaningfully quantify that at all?

Also important is who is asking for measurement. In one of my first jobs, my employer didn’t have email for all yet, so I asked for it. In response the MD asked me to put together the business case for email. This is a classic response when you don’t want to change anything. Often asking for measurement is meant to block change. Because they know you cannot predict the future. Motives shape measurements. The contextualisation of impact elsewhere to Flanders and Serbia in part took place because of this. Use existing answers against such a tactic.

Maturity and completeness of both the provision side, government, as well as the demand side, re-users, determine in equal measures what is possible at all, in terms of open data impact. If there is no mature provision side, in the end nothing will happen. If provision is perfect but demand side isn’t mature, it still doesn’t matter. Impact demands similar levels of maturity on both sides. It demands acknowledging interdependencies. And where that maturity is lacking, tracking impact means looking at different sets of indicators.

Measurements often motivate people to game the system. Especially single measurements. When number of datasets was still a metric for national portals the French opened with over 350k datasets. But really it was just a few dozen, which they had split according to departments and municipalities. So a balance is needed, with multiple indicators that point in different directions.

Open data, especially open core government registers, can be seen as infrastructure. But we actually don’t know how infrastructure creates impact. We know that building roads usually has a certain impact (investment correlates to a certain % rise in GDP), but we don’t know how it does so. Seeing open data as infrastructure is a logical approach (the consensus seems that the potential impact is about 2% of GDP), but it doesn’t help us much to measure impact or see how it creates that.

Network effects exist, but they are very costly to track. First order, second order, third order, higher order effects. We’re doing case studies for ESA on how satellite data gets used. We can establish network effects for instance how ice breakers in the Botnian gulf use satellite data in ways that ultimately reduce super market prices, but doing 24 such cases is a multi year effort.

E puor si muove! Galileo said Yet still it moves. The same is true for open data. Most measurements are proxies. They show something moving, without necessarily showing the thing that is doing the moving. Open data often is a silent actor, or a long range one. Yet still it moves.

Yet still it moves. And if we look at the patterns of established studies, that is what we indeed see. There are communalities in what movement we see. In the list on the slide the last point, that open data is a policy instrument is key. We know publishing data enables other stakeholders to act. When you do that on purpose you turn open data into a policy instrument. The cheapest one you have next to regulation and financing.

We all know the story of the drunk that lost his keys. He was searching under the light of a street lamp. Someone who helped him else asked if he lost the keys there. No, the drunk said, but at least there is light here. The same is true for open data. If you know what you published it for, at least you will be able to recognise relevant impact, if not all the impact it creates. Using it as policy instrument is like switching on the lights.

Dealing with lack of maturity means having different indicators for every step of the way. Not just seeing if impact occurs, but also if the right things are being done to make impact possible: Lead and lag indicators

The framework then is built from what has been used to establish impact in the past, and what we see in our projects as useful approaches. The point here is that we are not overly simplifying measurement, but adapt it to whatever is the context of a data provider or user. Also there’s never just one measurement, so a balanced approach is possible. You can’t game the system. It covers various levels of maturity from your first open dataset all the way to network effects. And you see that indicators that by themselves are too simple, still can be used.

Additionally the framework itself is a large scale sensor. If one indicator moves, you should see movement in other indicators over time as well. If you throw a stone in the pond, you should see ripples propagate. This means that if you start with data provision indicators only, you should see other measurements in other phases pick up. This allows you to both use a set of indicators across all phases, as well as move to more progressive ones when you outgrow the initial ones.finally some recommendations.

Some final thoughts. If you publish by default as integral part of processes, measuring impact, or building a business case is not needed as such. But measurement is very helpful in the transition to that end game. Core data and core policy elements, and their stakeholders are key. Measurement needs to be designed up front. Using open data as policy instrument lets you define the impact you are looking for at the least. The framework is the measurement: Only micro-economic studies really establish specific economic impact, but they only work in mature situations and cost a lot of effort, so you need to know when you are ready for them. But measurement can start wherever you are, with indicators that reflect the overall open data maturity level you are at, while looking both back and forwards. And because measurement can be done, as a data holder you should be doing it.

For the UNDP in Serbia, I made an overview of existing studies into the impact of open data. I’ve done something similar for the Flemish government a few years ago, so I had a good list of studies to start from. I updated that first list with more recent publications, resulting in a list of 45 studies from the past 10 years. The UNDP also asked me to suggest a measurement framework. Here’s a summary overview of some of the things I formulated in the report. I’ll start with 10 things that make measuring impact hard, and in a later post zoom in on what makes measuring impact doable.

While it is tempting to ask for a ‘killer app’ or ‘the next tech giant’ as proof of impact of open data, establishing the socio-economic impact of open data cannot depend on that. Both because answering such a question is only possible with long term hindsight which doesn’t help make decisions in the here and now, as well as because it would ignore the diversity of types of impacts of varying sizes known to be possible with open data. Judging by the available studies and cases there are several issues that make any easy answers to the question of open data impact impossible.

1 Dealing with variety and aggregating small increments

There are different varieties of impact, in all shapes and sizes. If an individual stakeholder, such as a citizen, does a very small thing based on open data, like making a different decision on some day, how do we express that value? Can it be expressed at all? E.g. in the Netherlands the open data based rain radar is used daily by most cyclists, to see if they can get to the rail way station dry, better wait ten minutes, or rather take the car. The impact of a decision to cycle can mean lower individual costs (no car usage), personal health benefits, economic benefits (lower traffic congestion) environmental benefits (lower emissions) etc., but is nearly impossible to quantify meaningfully in itself as a single act. Only where such decisions are stimulated, e.g. by providing open data that allows much smarter, multi-modal, route planning, aggregate effects may become visible, such as reduction of traffic congestion hours in a year, general health benefits of the population, reduction of traffic fatalities, which can be much better expressed in a monetary value to the economy.

2 Spotting new entrants, and tracking SME’s

The existing research shows that previously inactive stakeholders, and small to medium sized enterprises are better positioned to create benefits with open data. Smaller absolute improvements are of bigger value to them relatively, compared to e.g. larger corporations. Such large corporations usually overcome data access barriers with their size and capital. To them open data may even mean creating new competitive vulnerabilities at the lower end of their markets. (As a result larger corporations are more likely to say they have no problem with paying for data, as that protects market incumbents with the price of data as a barrier to entry.) This also means that establishing impacts requires simultaneously mapping new emerging stakeholders and aggregating that range of smaller impacts, which both can be hard to do (see point 1).

3 Network effects are costly to track

The research shows the presence of network effects, meaning that the impact of open data is not contained or even mostly specific to the first order of re-use of that data. Causal effects as well as second and higher order forms of re-use regularly occur and quickly become, certainly in aggregate, much higher than the value of the original form of re-use. For instance the European Space Agency (ESA) commissioned my company for a study into the impact of open satellite data for ice breakers in the Gulf of Bothnia. The direct impact for ice breakers is saving costs on helicopters and fuel, as the satellite data makes determining where the ice is thinnest much easier. But the aggregate value of the consequences of that is much higher: it creates a much higher predictability of ships and the (food)products they carry arriving in Finnish harbours, which means lower stocks are needed to ensure supply of these goods. This reverberates across the entire supply chain, saving costs in logistics and allowing lower retail prices across Finland. When 
mapping such higher order and network effects, every step further down the chain of causality shows that while the bandwidth of value created increases, at the same time the certainty that open data is the primary contributing factor decreases. Such studies also are time consuming and costly. It is often unlikely and unrealistic to expect data holders to go through such lengths to establish impact. The mentioned ESA example, is part of a series of over 20 such case studies ESA commissioned over the course of 5 years, at considerable cost for instance.

4 Comparison needs context

Without context, of a specific domain or a specific issue, it is hard to asses benefits, and compare their associated costs, which is often the underlying question concerning the impact of open data: does it weigh up against the costs of open data efforts? Even though in general open data efforts shouldn’t be costly, how does some type of open data benefit compare to the costs and benefits of other actions? Such comparisons can be made in a specific context (e.g. comparing the cost and benefit of open data for route planning with other measures to fight traffic congestion, such as increasing the number of lanes on a motor way, or increasing the availability of public transport).

5 Open data maturity determines impact and type of measurement possible

Because open data provisioning is a prerequisite for it having any impact, the availability of data and the maturity of open data efforts determine not only how much impact can be expected, but also determine what can be measured (mature impact might be measured as impact on e.g. traffic congestion hours in a year, but early impact might be measured in how the number of re-users of a data set is still steadily growing year over year)

6 Demand side maturity determines impact and type of measurement possible

Whether open data creates much impact is not only dependent on the availability of open data and the maturity of the supply-side, even if it is as mentioned a prerequisite. Impact, judging by the existing research, is certain to emerge, but the size and timing of such impact depends on a wide range of other factors on the demand-side as well, including things as the skills and capabilities of stakeholders, time to market, location and timing. An idea for open data re-use that may find no traction in France because the initiators can’t bring it to fruition, or because the potential French demand is too low, may well find its way to success in Bulgaria or Spain, because local circumstances and markets differ. In the Serbian national open data readiness assessment performed by me for the World Bank and the UNDP in 2015 this is reflected in the various dimensions assessed, that cover both supply and demand, as well as general aspects of Serbian infrastructure and society.

7 We don’t understand how infrastructure creates impact

The notion of broad open data provision as public infrastructure (such as the UK, Netherlands, Denmark and Belgium are already doing, and Switzerland is starting to do) further underlines the difficulty of establishing the general impact of open data on e.g. growth. The point that infrastructure (such as roads, telecoms, electricity) is important to growth is broadly acknowledged, with the corresponding acceptance of that within policy making. This acceptance of quantity and quality of infrastructure increasing human and physical capital however does not mean that it is clear how much what type of infrastructure contributes at what time to economic production and growth. Public capital is often used as a proxy to ascertain the impact of infrastructure on growth. Consensus is that there is a positive elasticity, meaning that an increase in public capital results in an increase in GDP, averaging at around 0.08, but varying across studies and types of infrastructure. Assuming such positive elasticity extends to open data provision as infrastructure (and we have very good reasons to do so), it will result in GDP growth, but without a clear view overall as to how much.

8 E pur si muove

Most measurements concerning open data impact need to be understood as proxies. They are not measuring how open data is creating impact directly, but from measuring a certain movement it can be surmised that something is doing the moving. Where opening data can be assumed to be doing the moving, and where opening data was a deliberate effort to create such movement, impact can then be assessed. We may not be able to easily see it, but still it moves.

9 Motives often shape measurements

Apart from the difficulty of measuring impact and the effort involved in doing so, there is also the question of why such impact assessments are needed. Is an impact assessment needed to create support for ongoing open data efforts, or to make existing efforts sustainable? Is an impact measurement needed for comparison with specific costs for a specific data holder? Is it to be used for evaluation of open data policies in general? In other words, in whose perception should an impact measurement be meaningful?
The purpose of impact assessments for open data further determines and/or limits the way such assessments can be shaped.

10 Measurements get gamed, become targets

Finally, with any type of measurement, there needs to be awareness that those with a stake of interest into a measurement are likely to try and game the system. Especially so where measurements determine funding for further projects, or the continuation of an effort. This must lead to caution when determining indicators. Measurements easily become a target in themselves. For instance in the early days of national open data portals being launched worldwide, a simple metric often reported was the number of datasets a portal contained. This is an example of a ‘point’ measurement that can be easily gamed for instance by subdividing a dataset into several subsets. The first version of the national portal of a major EU member did precisely that and boasted several hundred thousand data sets at launch, which were mostly small subsets of a bigger whole. It briefly made for good headlines, but did not make for impact.

In a second part I will take a closer look at what these 10 points mean for designing a measurement framework to track open data impact.

Last week the Danish government further extended the data available through their open data distributor, and announced some impressive resulting impact from already available data.

In 2012 the roadmap Good Basic Data for Everyone was launched, which set out to create an open national data infrastructure of the 5 core data sets used by all layers of government (maps, address, buildings, companies, people, see image). I attended the internal launch at the Ministry, and my colleague Marc contributed to the financial reasoning behind it (PDF 1, PDF 2). The roadmap ran until 2016, and a new plan is now in operation that builds on that first roadmap.


An illustration from the Danish 2012 road map showing how the 5 basic data registers correlate, and how maps are at its base.

Steadily data is added to those original 5 data sets, that increases the usability of the data. Last week all administrative geographic divisions were added (these are the geographic boundaries of municipalities, regions, 2200 parishes, jurisdictions, police districts, districts and zip-codes). This comes after last November’s addition of the place name register, and before coming May’s publication of the Danish address book. (The publication of the address database in 2002 was the original experience that ultimately led to the Basic Data program).

The primary goal of the Basic Data program has always been government efficiency, by ensuring all layers of government use the same core data. However the Danish government has also always recognised the societal and economic potential of that same data for citizens and companies, and therefore opening up the Basic Data registers as much as possible was also a key ingredient from the start. Interestingly the business case for the Basic Data program was only built on projected government savings, and those projections erred on the side of caution. Any additional savings realised by government entities would remain with them, so there was a financial incentive for government agencies to find additional uses for the Basic Data registers. External benefits from re-use were not part of the businesscase, as they were rightly seen as hard to predict and hard to measure, but were also estimated (again erring on the side of caution.) The projected savings for government were about 25 million Euro per year, and the project external benefits at some 65 million per year after completion of the system. Two years ago I transposed these Danish (as well as Dutch and other international) experiences with building an open national data infrastructure this way for the Swiss government, as part of a study with the FH Bern (PDF of some first insights presented at the 2016 Swiss open data conference in Lausanne).

Danish media this week reported new impact numbers from the geodata that has been made available. Geodata became freely available early 2013 as part of the Basic Data program. In 2017 the geodata saw over 6 billion requests for data, a 45% increase from 2016. Government research estimates the total gains in efficiency and productivity from using geodata for 2016 at some 470 million Euro (3.5 billion Danish Kroner). This is about 5 times the total of savings and benefits originally projected annually for the entire system back in 2012 (25 million savings, and 65 million in benefits).

It once again shows how there really is no rational case for selling government data, as the benefits that accrue from removing all access barriers will be much larger. This also means that government revenue will actually grow, as increased tax revenue will outstrip both lost revenue from data sales and costs of providing data. A timely and pertinent example from Denmark, now that I am researching the potential impact of open data for the Serbian government.