This week I provided a training session on how open data can play a role in public governance integrity and in fighting corruption. The Hague Academy is hosting a group of 11 participants from a wide variety of countries (Nepal, Uganda, Nigeria, Colombia, Northern Iraq, Northern Macedonia, Jordania, Indonesia) for a two week training course. My colleague Paul and I were invited to do a half day session on open data. Where Paul explained the status quo of open data in the Netherlands, I talked about my international experiences, and what that tells me concerning open data in the fight against corruption.

This is the basic outline of what I talked about:

I started off with noting that data these days is a geopolitical issue, making it a strategic good for any organisation.
Then after defining what open data is (pro-actively published, no tech, financial or legal barrierers to re-use), I mentioned what it does: allow access to all (the clue is in the word open), bring in new stakeholders, and allow those stakeholders to act differently. These aspects create impact in different areas, economic activity, civic activity, better and cheaper public service, and transparency.

If you know these impacts occur, you can set out to cause it to happen. Around an issue you can aim to activate stakeholders by providing them with data, for instance to stimulate economic activity. This makes open data a policy instrument, and a cheap one compared to regulation and financing.

But in many instances if you set out to achieve one type of impact, you are likely to also see other types of impact.
This is important because it allows you to find the right type of intrinsic motivation for an entity to publish their data, while knowing it allows other types of impact that you’re interested in as well. Such as planning increased transparency by mapping the government funds flowing into a neighbourhood, and then seeing citizens taking over a community center as a non-profit, reducing the strain on the city government’s budget, and creating additional jobs by providing training to other groups to do the same. Or flipped around, if a government is averse to transparency they may be tempted by the economic potential of certain data being open, and cause transparency as a side effect.

In terms of integrity and anti-corruption, I find I make a distinction between three types of data.

There’s the basic ‘daylight’ data, that may immediately show misconduct. Think of the UK MP’s expenses scandal in 2009. Or the current ‘Shell Papers‘ project by Dutch media, which is about shedding daylight on ties between the multinational and government.

Then there’s data that in itself doesn’t show misconduct, but in combination with others sources allows people agency. E.g. in researching connections, such as combining procurement data with ultimate beneficial ownership of companies winning contracts, and reverse searching the data to find what else they won from government tenders. Or opening up court statistics, verdicts, and court performance reports, in order to allow players in the judicial system to reduce differences between courts, thus increasing judicial system quality and reducing uncertainty for businesses.

The third type of data is data that can be used to spot patterns, or spot (absence of) impact. Think of a situation where the ministry of education allocates budgets to schools, and sends the money through a regional organisation, and where the schools receive their funding from that regional organisation. Now the ministry knows what the budget is and what they sent, but not if that sum arrived. The school knows what it received but not what was budgeted. When both release the data it may show there are differences or not. Of interest here is it cuts out the ability of a middleman to control communication flows, and a bottleneck becomes visible. Opening up a chain like that makes issues visible not because the actions of a corrupt actor show up in the data, but because an expected thing isn’t happening. This means data that doesn’t directly show corruption can be used to detect it. Budget versus delivery and impact comparison is the basic type. None of this is under the direct influence of those involved in corruption, because the data is about steps before and after them in the process. Steps whose actors likely don’t feel threatened by opening that data up.

This last bit is what national governments can use to lower corruption. Not necessarily by catching bad actors and seeking punishment, but by reducing opportunity and make it slowly disappear. Other governments bet heavily on e-government for similar reasons. E-gov measures reduce the number of face to face interaction with civil servants, and thus cut out potential points of bribery. This however only addresses low level corruption, and doesn’t attack systemic corruption.

Then I switched gears a bit and talked about the difference between the perspective from inside a country, and across borders. Corruption and malfeasance crosses borders, but governments are bound to their own jurisdiction. Projects that cut across borders and allow governments a more holistic view on a subject can be useful. IATI, bringing transparency to the entire aid sector is a good example. Also because all actors both have something to lose and to gain by opening data, and it works because those interests and reservations are overlapping in a way that makes the benefits for an actor outweigh the risk of sharing. A project like Open Corporates is another example of how aggregating public data across jurisdictions makes an enormous difference. A very different type of cross border data is earth observation data which can surface illegal deforestation, human rights abuses, war crimes and more. The EU e.g. releases all their satellite data, which allows a peek into countries that might not want to publish anything about it. Then there are the Panama Paper‘s type of leaks that result in national level stories, and the type of ‘open source intelligence’ projects that Bellingcat does. This is outside normal government capabilities to do, and outside their control to prevent or hinder, yet often results in outcomes that are useful to national governments.

All these things depend on data existing around an issue. This may not be the case. Gathering data as collective action can be an intervention in itself.

In order to obtain data out of government knowing the applicable regulations and legal framework very well is important. As well as having that detailed knowledge spread out to non-expert circles. If an agency say a certain regulation doesn’t apply to them, or they can’t release something because of privacy or secrecy concerns, it is necessary to be able to know for yourself if that rings true. Often these things are used to stonewall requests. I’ve worked in a country where privacy protection is often cited as a reason to not share data between government entities. This may well be laudable intent, but in that country privacy laws only pertains to companies, not government. Or an example where the official secrets act is used often to stonewall requests, but it only list a dozen specific types of data it covers, and leaves all other decisions at the discretion of the data owner.

Knowledge of regulations however can also be used against you. An outdated official secrets acts that clashes with more modern rules on information freedom, can be used to tackle a political adversary by reporting a breach of one law, though the act in question was motivated by another law. And then there is the sad and disturbing case of the murder of journalist Jan Kuciak and his fiancée Martina Kušnírová, where his colleagues suspect it was because his FOIA requests were leaked to people who realised he was investigating them.

Towards the end I looked more at what types of data is of primary interest in the context of integrity and anti-corruption. On that list are things like procurement, tenders and awarded contracts, spending (which also means it needs list of government entities), ownership (companies, buildings, which means additional need for addresses, maps), judicial verdicts / consolidated regulations and government decisions.

But in absence of some of that, other data sets might serve as proxies for it. Data less obviously tied to transparency, that still would have echoes of the impact of mismanagement for instance. If maintenance is budgeted but not properly executed and the data on road works is missing, data on road incidents, traffic jams, traffic intensity and flow might tell the story as well. Proxies often are outside the scope of control of misbehaving actors which is an added benefit. Data you need may also be available elsewhere. Many countries share data with e.g. the OECD under international obligations, that isn’t necessarily released inside the country. But it can be obtained from those international organisations. World Bank similarly publishes a lot that may be less easy to obtain inside the country it describes

In summary when thinking about using open data for anti-corruption, it is important to think in terms of the three things that make open data a policy instrument: issues, connected stakeholders, and relevant data.
From this you can explore what is needed to make an issue visible, and who is needed to do that. What action is needed to reduce an issue, and who. What is needed to measure result, and who would do that.
None of this is a silver bullet for corruption, and it can’t be. Corruption has many causes, and regularly serves a purpose too. But open data does play a role in chipping away at it in different places simultaneously. It also allows you to switch focus from one specific situation to another, and every result may lead to additional ones.

Following this we had lively discussion, which continued over lunch.

(During the session I mentioned a wide range of specific examples I encountered or are familiar with, which I largely left out here)

Stop corruption
image by Naberacka, license CC-BY-SA