Dutch Provinces publish open data, but it always looks like it is mostly geo-data, and hardly anything else. When talking to provinces I also get the feeling they struggle to think of data that isn’t of a geographic nature. That isn’t very surprising, a lot of the public tasks carried out by provinces have to do with spatial planning, nature and environment, and geographic data is a key tool for them. But now that we are aiding several provinces with extending their data provision, I wanted to find out in more detail.

My colleague Niene took the API of the Dutch national open data portal for a spin, and made a list of all datasets listed as stemming from a province.
I took that list and zoomed in on various aspects.

At first glance there are strong differences between the provinces: some publish a lot, others hardly anything. The Province of Utrecht publishes everything twice to the national data portal, once through the national geo-register, once through their own dataplatform. The graph below has been corrected for it.

What explains those differences? And what is the nature of the published datasets?

Geo-data is dominant
First I made a distinction between data that stems from the national geo-register to which all provinces publish, and data that stems from another source (either regional dataplatforms, or for instance direct publication through the national open data portal). The NGR is theoretically the place where all provinces share geo-data with other government entities, part of which is then marked as publicly available. In practice the numbers suggest Provinces roughly publish to the NGR in the same proportions as the graph above (meaning that of what they publish in the NGR they mark about the same percentage as open data)

  • Of the over 3000 datasets that are published by provinces as open data in the national open data portal, only 48 don’t come from the national geo-register. This is about 1.5%.
  • Of the 12 provinces, 4 do not publish anything outside the NGR: Noord-Brabant, Zeeland, Flevoland, Overijssel.

Drenthe stands out in terms of numbers of geo-data sets published, over 900. A closer look at their list shows that they publish more historic data, and that they seem to be more complete (more of what they share in the NGR is marked for open data apparantly.) The average is between 200-300, with provinces like Zuid-Holland, Noord-Holland, Gelderland, Utrecht, Groningen, and Fryslan in that range. Overijssel, like Drenthe publishes more, though less than Drenthe at about 500. This seems to be the result of a direct connection to the NGR from their regional geo-portal, and thus publishing by default. Overijssel deliberately does not publish historic data explaining some of the difference with Drenthe. (When something is updated in Overijssel the previous version is automatically removed. This clashes with open data good practice, but is currently hard to fix in their processes.)

If it isn’t geo, it hardly exists
Of the mere 48 data sets outside the NGR, just 22 (46%) are not geo-related. Overall this means that less than 1% of all open data provinces publish is not geo-data.
Of those 22, exactly half are published by Zuid-Holland alone. They for instance publish several photo-archives, a subsidy register, politician’s expenses, and formal decisions.
Fryslan is the only province publishing an inventory of their data holdings, which is 1 of their only 3 non geo-data sets.
Gelderland stands out as the single province that publishes all their geo data through the NGR, hinting at a neatly organised process. Their non-NGR open data is also all non-geo (as it should be). They publish 27% of all open non-geo data by provinces, together with Zuid-Holland account for 77% of it all.

Taking these numbers and comparing them to inventories like the one Fryslan publishes (which we made for them in 2016), and the one for Noord-Holland (which we did in 2013), the dominance of geo-data is not surprising in itself. Roughly 80% of data provinces hold is geo related. Just about a fifth to a quarter of this geo-data (15%-20% of the total) is on average published at the moment, yet it makes up over 99% of all provincial open data published. This lopsidedness means that hardly anything on the inner workings of a province, the effectivity of policy implementation etc. is available as open data.

Where the opportunities are
To improve both on the volume and on the breadth of scope of the data provinces publish, two courses of action stand open.
First, extending the availability of geo-data provinces hold. Most provinces will have a clear process for this, and it should therefore be relatively easy to do. It should therefore be possible for most provinces to get to where Drenthe currently is.
Second, take a much closer look at the in-house data that is not geo-related. About 20% of dataholdings fall in this category, and based on the inventories we did, some 90% of that should be publishable, maybe after some aggregation or other adaptations.
The lack of an inventory is an obstacle here, but existing inventories should at least be able to point the other provinces in the right direction.

Make the provision of provincial open geodata complete, embrace its dominance and automate it with proper data governance. Focus your energy on publishing ‘the rest’ where all the data on the inner workings of the province is. Provinces perpetually complain nobody is aware of what they are doing and their role in Dutch governance. Make it visible, publish your data. Stop making yourself invisible behind a stack of maps only.

Last week ten of the twelve Dutch Provinces met at the South-Holland Provincial government to discuss open data, and exchange experiences, seeking to inspire each other to do more on open government data. I participated as part of my roles as open data project lead for both the Province of Overijssel, and the Province Fryslân.

There were several topics of discussion.

  • The National Open Government Action Plan (part of the OGP effort), a new version of which is due next spring, and for which input is currently sought by the Dutch government.
  • A proposal by the team behind the national open data platform to form a ‘high value data list’ for provincial data sets.
  • Several examples were discussed of (open) data being used to enhance public interaction.

I want to briefly show those examples (and might blog about the other two later).

Make it usable, connect to what is really of significance to people
Basically the three examples that were presented during the session present two lessons:

1) Make data usable, by presenting them better and allow for more interaction. That way you more or less take up position half-way between what is/was common (presenting only abstracted information), and open data (the raw detailed data): presenting data in a much more detailed way, and making it possible for others to interact with the data and explore.

2) Connect to what people really care about. It is easy to assume what others would want to know or would need in terms of data, it is less easy to actually go outside and listen to people and entrepreneurs first what type of data they need around specific topics. However, it does provide lots of vital clues as to what data will actually find usage, and what type of questions people want to be able to solve for themselves.

That second point is something we always stress in our work with governments, so I was glad to hear it presented at the session.

There were three examples presented.

South-Holland put subsidies on a map
The Province of South-Holland made a map that shows where subsidies are provided and for what. It was made to better present to the public the data that exists about subsidies, als in order to stimulate people to dive deeper into the data. The map links to where the actual underlying data should be found (but as far as I can tell, the data isn’t actually provided there). A key part of the presentation was about the steps they took to make the data presentable in the first place, and how they created a path for doing that which can be re-used for other types of data they are seeking to house in their newly created data warehouse. This way presenting other data sources in similar ways will be less work.

The subsidy map

Gelderland provides insight into their audit-work
Provinces have a task in auditing municipal finances. The Province of Gelderland has used an existing tool (normally used for presenting statistical data) to provide more detail about the municipal finances they audited. Key point here again was to show how to present data better to the public, how that plays a role in communicating with municipalities as well, and how it provides stepping stones to entice people to dive deeper. The tool they use provides download links for the underlying data (although the way that is done can still be significantly improved, as it currently only allows downloads of selections you made, so you’d have to sticht them back together to reconstruct the full data set)

Screenshot of the Gelderland audit data tool

Flevoland listens first, then publishes data
The last example presented was much less about the data, and much more about the ability to really engage with citizens, civil society and businesses and to stimulate the usage of open data that way. The Province Flevoland is planning major renovation work on bridges and water locks in the coming years, and their aim is to reduce hindrance. Therefore they already now, before work is starting, are having conversations with various people that live near or regularly pass by the objects that will be renovated. To hear what type of data might help them to less disrupt their normal routines. Resulting insights are that where currently plans are published in a generic way, much more specific localized data is needed, as well as much more detailed data about what is going to happen in a few days time. This allows people to be flexible, such as a farmer deciding to harvest a day later, or to move the harvest aways over water and not the road. Detailed data also means communicating small changes and delays in the plans. Choosing the right channels is important too. Currently e.g. the Province announces construction works on Twitter, but no local farmer goes there for information. They do use a specific platform for farmers where they also get detailed data about weather, water etc, and distributing localized data on construction works there would be much more useful. So now they will collaborate with that platform to reach farmers better. (My company The Green Land is supporting the Province, 2 municipalities and the water board in the province, in this project)

Overview of the 16 bridges and waterlocks that will be renovated in the coming years

Various stakeholders around each bridge or waterlock are being approached