For the Province of South-Holland we’re currently helping them to extend their open data provision. Next to looking at data they hold relevant to key policy domains, we also look at what other data is available elsewhere for those domains. For instance nationwide datasets with local granular level of detail. In those cases it can be of interest to take the subset relevant for the Province and republish that through their own channels.
One of the relevant topics is energy transition (to sustainable energy sources). Current and historic household usage is of interest here. The companies that maintain the grid publish yearly data per postcode, or at least some of them do. There are seven of these companies.
Luckily all three companies active in South-Holland do publish that data.
Having this subset of data is useful for any organisation in the region that wants to limit the amount of data they have to dig through to get what they need, for the provincial organisation itself, and for individual citizens. Households that have digital meters have access to their daily energy usage readings online. This data allows them to easily compare their personal usage with their neighbours and wider surrounding area. For instance I established that our usage is lower for both electricity and gas than average in our street. It is also easier to map, or otherwise visualise, in a meaningful way for the province and relevant regional stakeholders.
Here’s a brief overview of the steps we’re taking to get to a province-wide data set.
Download the data for the years available for Westland, Liander and Stedin (Westland goes back to 2010, the others to 2008)
Check the data formats: Westland and Stedin provide CSV, Liander XLSX
Check data structure: all use the same structure of fields and conventions
To get only the data for South-Holland we use the postcode that is mentioned in the data.
The Dutch postcode zones do not conform to provincial boundaries however, so we take the list of four position postcodes and determine the ones that fall within South-Holland:
The data contains 6 position postcodes of the structure 1234AB. We need to split them into the four digits and the two letters, to be able to match them with the ranges that fall within the province.
For personal data protection purposes, in the data, for 6 position postcodes where the number of addresses in that postcode is less than 10, the data is aggregated with a neighbouring postcode, until the number of addresses is higher than 9. It is not certain that those aggregations fall within a single province. The data provides a ‘from’ 6 position postcode and a ‘to’ 6 position postcode. This is the same value where the number of addresses in a postcode is high enough but can be a wider range.
We need to test if the entire postcode range in a single data record falls within one of the ranges of postcodes that belong in South-Holland.
For the small number of aggregates that fall into two provinces we can adopt the average usage number, but need to mark that the number of households in that area is unknown,
or retrieve the actual number of addresses from the national address and building database, and mark that the average energy usage values are from a larger number of addresses.
Alternatively we can keep the entire range, including the part outside the province,
or we exclude the entire range and leave a ‘hole in the map’.
In any case we need to mark in the data what we did, and why.
The result is then a data set in CSV that consolidates the three sources for all those records that fall within the province.
This dataset can then be mapped, e.g. in Q-GIS or other tools in use within the province South-Holland.
We provide a recipe and/or script from the above steps that can take the future yearly data sets from the three sources and turn them into a consolidated subset for South-Holland, so that the province can automate keeping the data up to date.