Currently the 2013 Open Data Census of the OKFN, for which I am the lead editor, is taking place. It tracks 10 data sets in each country. For the Netherlands questions were raised by a.o. Andrew Stott on the Dutch postcodes which are shown as fully open. Specifically it seemed there was no download link. So, as an experiment, I aimed to get my hands on the Dutch postcodes. Follow me down the rabbit hole.
Dutch postcodes, after a court verdict, became open in February 2012, something celebrated by the responsible Ministry for Infrastructure and Environment (the link to the press release in that article is dead). So, where’s that data?
The postcodes, first of all, are not a separate data set but part of the much larger BAG data set which holds all addresses and all buildings for the Netherlands, with their geospatial references.
National Data Portal
First port of call is the Dutch National Data Portal, which indeed holds a page for the BAG data set. It points to three data files, and lists the license as Public Domain. Nice, we’re done! Except that none of the links to the data files work. Further down the hole.
Cadastral Office, the data holder
Next stop, is the Cadastral Office, as they are the data holder, and the links in the national data portal all point there. The Cadastre has a page about the BAG listing a number of ‘products’. No links to the actual data, but the descriptions make clear that to get access to the data you need to register first, then apply for a subscription on the data, which is not free if you are not in the public sector. No indication whatsoever that any data from the BAG is available as Open Data. However they do make mention of PDOK (acronym for “public service on the map”) which offers a WFS and WMS service as well as a viewer for the data. Further down the hole.
PDOK is the next incarnation of the INSPIRE based national geodata portal. Looking for the BAG there does not provide any useful links: It does expose a ‘temporary’ WMS and WFS service. WMS provides map images, WFS provides data, but its access is restricted and fees apply the XML feature description says (none of that in human readable text on the site though). It probably would not give us the desired postcodes anyway. The links it provides are to nationaalgeoregister.nl, PDOK’s predecessor. Further down the hole.
National Geo Register
At the National Geo Register, I try another search for BAG, which yields nothing new. It appears it conflates my BAG search with ‘baggeren’ (dredging in English) as it starts the same. Then I try ‘adressen’ (addresses) with more success, and one of the results reads “INSPIRE Download Service Addresses” (incidently a stripped down version of that search result also comes up at PDOK under addresses.) Following that link it finally gets interesting, as the next page provides both a download link, an e-mail address at the Cadastral Office for enquiries, and some information on re-use rights. It says the last metadata edit was last August.
The usage conditions are listed in the National Geo Register as unrestricted, but access is listed as restricted, referencing a license called ‘GEO Gedeeld’. The National Data Portal (all the way up the rabbit hole) promised us Public Domain, so what is this GEO Gedeeld license? The link to the license is dead however. Looking for the same license title at the Cadastral Office site, yields a PDF that says attribution is mandatory, and resharing the data set is forbidden. This turns out to be a ‘roll-your-own’ license by the geodata sector, adding restrictions on top of the Creative Commons framework, including confusing pseudo-CC icons. It is however entirely unclear if this is the license that is attached to the data mentioned in the National Geo Register.
Getting to the data, sort of
As said the National Geo Register lists a download link, and if you open that XML file, in the ‘subtitle’ field it says this is the actual full BAG data. The same XML file also provides a link to a description in XML, that points to a Public Domain license again for the acccess restrictions on the file. So the National Geo Register page says a restricted license applies, but the XML metadata that comes with it specifies Public Domain as does the national data protal. Finally it also provides a link to the actual data dump.
The data dump is a 1.4 GB zip file, that contains 7 other zip files, yielding over 30GB of data when unpacked, split up in 21MB xml files. The files contain however no reference to a license. This is the full BAG data set, containing all addresses and buildings for all of the Netherlands. From this dataset you need to combine several subsets to get to a postcode list: The ‘NUM’ files give you address index numbers, its georeferences, the house number and the postcode, and a number that corresponds with a street. The ‘OPR’ files give you the corresponding street name, and the number of the place it is in. The ‘WPL’ files give you the place name, and through a separate table also the municipality it is in. (Do note that is does not include postcodes that are not connected to geolocations, such as PO Boxes. The primary purpose of this database is not postcodes but addresses and buildings.)
Open or not?
So how open is this much welcomed open post code data? It seems the data holder, the Cadastral Office, purposefully obfuscates the existence of the data dump, making no direct reference to it at all, and offering their paid for services instead. There is however a full machine readable data dump that is also directly provided by the Cadastral Office. Whether it is openly licensed is not entirely clear, as the National Geo Register states a restrictive license on the webpage, but provides a Public Domain dedication in the XML metadata, and there is the Public Domain dedication in the National Data Portal.
Make it Open for real!
To make this data Open in practice, and not just in theory the National Data Portal as well as the Cadastral Office should reference the existing data dump directly, as well as provide the Public Domain license info with the data dump. If no one can find your data, then it is not open. Next to technical openness (machine readable, online) and legal openness (public domain, or attribution) also social openness is needed (easily findable, contact info readily available).