Last year I spent a large amount of time participating in the study that provided advice on which government data sets to include in the mandatory list that is part of the Open Data Directive.

The Open Data Directive, which should have been transposed by EU Member States into national law by last July, but still mostly isn’t, provides a role to the EC to maintain a list of ‘high value data sets’ that all EU countries must make freely available for re-use through APIs and bulk download. This is the first time that it becomes mandatory to pro-actively publish certain data as open government data. Until now, there were mandatory ways to provide open data upon request, but the pro-active publication of such open data has always been voluntary (with various countries making a wide variety of voluntary efforts btw). Also the availability of government data builds on the national freedom of information framework, so the actual availability of a certain data set depends on different legal considerations in different places. The high value data list is the first pan-EU legal requirement that is equal in all EU Member States.

I was part of a team that provided a study into which data sets should appear on that high value data list. The first iteration of this list (to be extended and amended periodically) by the EC covers six thematic areas: geographic data, statistics, mobility data, company information, earth observation and environment, and meteorology. I was responsible for the sections on earth observation and environment, and meteorology, and I’m eager to see how it has been translated into the implementation act as for both those thematic areas it would mean a very significant jump in open data availability if the study results get adopted. We submitted our final report by September 2020, and in the year since then we’re all waiting to see how the implementation act for the high value data will turn out. Our study only is a part in that, as it is itself an input for the EC’s impact assessment for different choices and options, which in turn forms the basis for a negotiation process that includes all Member States.

Originally the implementation act was expected to be published together with other EC proposals, such as the Data Governance Act last December. This as the EU High Value Data list is part of a wider newly emerging EU legal framework on digitisation and data. But nothing much happened until now. First the expectation was Q1, then by the summer, then shortly after the summer, and now the latest I hear from within the EC is ‘hopefully by the end of the year’.

It all depends on poltical will at this stage it seems, to move the dossier forward. The obstacle to getting the implementing act done apparently is what to do with company data (and ultimate beneficial ownership data). Opening up company registers has clear socio-economic benefits, outweighing the costs of opening them up. There are privacy aspects to consider, which can be dealt with well enough I think, but were not part of our study as it only considered socio-economic impacts and expected transition costs, and the demarcation between the Open Data Directive and the GDPR was placed outside our scope.

There apparently is significant political pressure to limit the openness of such company registers. There must be similarly significant political pressure to move to more openness, or the discussion would already have been resolved. It sounds to me that the Netherlands is one of those politically blocking progress towards more openness. Even before our study commenced I heard rumours that certain wealthy families had the ear of the Dutch prime minister to prevent general free access to this data, and that the pm seemed to agree up to the point of being willing to risk infringement proceedings for not transposing the Open Data Directive completely. As it stands the transposition into national law of the Open Data Directive hasn’t happened mostly, and the implementing act for high value data hasn’t been proposed at all yet.

Access Info, the Madrid based European NGO promoting the right to access to information, has in June requested documents concerning the high value data list from the EC, including the study report. The report hasn’t been published by the EC themselves yet because it needs to be published together with the EC’s impact assessment for which the study is an input, and alongside the implementation act itself.
Access Info has received documents, and has published the 400+ page study report (PDF) that was submitted a year ago, alongside a critical take on the company register data issue.

I am pleased the study is now out there informally at least, so I can point people to sections of it in my current work discussions on how to accomodate the new EU legal framework w.r.t. data, of which the high value open data is a part. Previously there were only publicly available slides from the last workshop that was part of the study, which by neccessity held only general information about the study results. Next to company registers, which I am assuming is the roadblock, there is much in the study that also is of importance and now equally suffering under the delays. I hope the formal publication of the report will follow soon. The publication of the impementing act is a key step for European open data given its first ever EU-wide mandates.

Virk Data Dag
A 2014 workshop, Virk Data Dag, at the Danish Business Authority discussing use cases for the open Danish company register, where I presented and participated.

Finally, a declaration of interests is in order for this posting I think:

  • My company was part of the consortium that did the mentioned study. I led the efforts on earth observation and environmental data, and on meteorological data.
  • In current work for my company, the implementation act for high value data, and other recent EC legal proposals are of importance, as I am helping translate their impact and potential to Dutch national (open) data infrastructure and facilitating data re-use for public issues.
  • I am a voluntary board member of the NGO Open State Foundation. OSF advocates full openness of company registers, and co-signed the critical take Access Info published. The board has no influence on day to day actions, which are the responsibility of the NGO’s director and team.
  • I am personally in favor of opening up company registers as open data.
    • I think that privacy issues can be readily addressed (something that is directly relevant to me as a sole trader business, as co-owner of an incorporated business, and as a board member of an association for which my home address is currently visible in the company register)
    • I think that being visible as a business owner or decision maker is part of the social contract I entered into in exchange for being personally shielded from the business risks I am exposed to. Society is partially shielding me from those risks, as it allows social benefits to emerge (such as creating an income for our team), but in turn people need to be able to see who they’re dealing with.
    • I think there is never a business case where charging fees for access to a monopolistic government database such as company registries makes sense. Such fees merely limit access to those able to afford it, causing unequality of access, and power and information assymmetries. Data collection for public tasks is a sunk cost by definition, access fees are always less over time than the additional tax revenue and social value resulting from re-use of freely available data. The only relevant financial aspect to address is that provision costs accrue with the dataholder and benefits with the treasury, which general budget financing is the remedy for
    • I think that already open company registers in Europe and elsewhere provide ample evidence that many formulated fears w.r.t. such openness don’t become reality.

In a tweet Mike Haber mentioned Otter.ai, a spoken text transcription tool, in the context of making notes (in Obsidian.md). Taking a look at the Otter.ai website I tried to create an account, only to be told that the unique email address I entered was already tied to an existing account. Indeed, my 1Password contained a login that I created in March 2018, but never used. Despite, or maybe because of, the friction I feel using audio, I decided to try it out now.

I tried three things.
One, where I spoke to my laptop, while seeing the transcription written out live in front of me. This worked well, but creates odd feedback loops of self-consciousness when I read back my own words while speaking them. It’s like using a mirror to guide your hand movements, but then for speech.
Two, where I recorded myself talking using QuickTime and uploaded the resulting sound file. This removed the strange feedback loop of seeing the text emerge while talking, but had me sitting behind my laptop and manually uploading a file afterwards.
Three, where I used the service’s Android-app to dictate to my phone while walking around the house. This felt the most natural of the three.

Resulting transcripts can be manually exported in various formats from the browser interface, including flat text and to the laptop’s clipboard. An automatic export in txt would be nice to have. Otter.ai only does English (and does it well), which isn’t an issue when I’m in an English language context, but otherwise quickly feels artificial to my own ears.

From my brief tests three cases stood out for me that I can get comfortable with:

  • dictating short ideas or descriptions while on the move around the house
  • stream of consciousness talk, either while walking around the house or stationary
  • describing an object as I handle it, specifically physical books as I first go through them to see what it is about, in preparation for reading.

Otter.ai has a generous free tier of 10 hours per month and three free uploads (I assume the idea behind that it is they get more data to train their algorithms with), but the next tier up (‘pro’) gives you ten times that per month and unlimited uploads within those 100 hours for $100USD / yr. That is I think a pretty good deal, especially compared to other services.

Differences between Otter.ai and other services I found online concern 1) real time audio capture and transcription, where others mostly just provide for uploads of audio files, 2) costs, where others charge by the minute and/or generally charge much more, 3) available languages, where Otter.ai only provides English, and others cater to a wide range of languages.
All the services I looked at allow listening to audio while you go through a transcription, e.g. to add corrections.
Two European services I found are Amberscript (a Dutch company), which has a prepaid option of 15 Euro / hour (or 40 Euro per month subscription for 5 hours), and Happyscribe (a French company) which charges by the minute at 12 Euro / hour.

There is of course also the dictation built into Microsoft Word. Word supports Dutch well. Although I normally work in LibreOffice, I do have Word installed to prevent weird conversion issues working on documents with clients who run MS products. It does mean being tied to the laptop while dictating though, and of course like any other US company, including Otter.ai, all audio goes to US servers for speech recognition. Also, after the dictation there’s no audiofile left over, only the document remains. It means that odd transcriptions can remain a mystery, because you can’t go back to the original. You should do such corrections immediately in that case. After such a correction phase this is no longer an issue, then it’s just a difference with other services that are designed more towards transcription of e.g. interviews, where MS Word is geared towards dictation. In the web based Word version there’s a transcription feature separate from the dictation feature, that provides 300 minutes for free per month and does retain the audio file for you.

For now I will aim to experiment with voice dication some more. Probably for the first few days using MS Word on my laptop for dictation, and using Otter.ai’s mobile app for the same, in the three mentioned use cases. If I find it gets more useful than strange (as I’ve found it to be in previous years and attempts), I will likely use Amberscript, as it is EU based and has a mobile app. Their prepaid option of 15 Euro / hour is probably good for quite some time at first.

Since a year or so the deterioration of the LinkedIn timeline has been very visible to me. Next to an increasing number of people sharing things as if LinkedIn is Facebook, the timeline is not under the control of the user, and presents algorithmically determined items. Sometimes that results in seeing things days or weeks after they were posted when I would have liked to see them the day they were posted, but instead got the rants of someone else. The only way one can shape the LinkedIn timeline is by removing people from it. So I did, and removed all people from it. I came to the conclusion that I’d rather have no LinkedIn timeline, and use it as it was in the past, as a digitised contact list. Of course that brings my LinkedIn experience back to the place where it was when Jyri Engestrom predicted its demise if it didn’t introduce an object of sociality in April 2005. I’ve been using LinkedIn since June 2003 (user nr. 8730), and the barebones ‘digital rolodex’ actually serves me well, to see the background of someone I meet, and to allow others to see the same about me. From now on I can skip the timeline that LinkedIn serves me as a default, and engage with people in my network, and the things they share on my own terms and initiative, seeking them out when I want. Next to keeping my own notes.

To get to an empty timeline I had to unfollow everyone I’m connected to. Which is not a simple thing to do, as LinkedIn provides no easy option to unfollow large amounts of people, and requires you to unfollow everyone one by one. Of course there are work arounds and that is what I used, with a snippet of code in my browser console.


LinkedIn can be nice and quiet, with everyone unfollowed

Downloaded the entire legislative package proposed in July as part of the EU Green Deal. Quite a bit of reading to go through 🙂
These proposals are relevant to my current work on keeping track of the emerging EU legal framework on digitisation, AI and data. Within that framework dataspaces (a single market for data) are proposed, with the Green Deal dataspace being the first to take shape. The Green Deal itself depends on data and monitoring to track progress towards goals, but also to be able to create effective measures, and as a consequence forms a key bridge between data and policy goals. Green and digital overlap strongly here.


the list of legislative documents I downloaded for close reading

Op 20 oktober vindt de FOSS4Geo Nederland conferentie plaats, met daags ervoor op 19 oktober een middag vol workshops om zelf aan de slag te gaan met open source geo-tools. Gastgever is het wereldvermaarde ITC in Enschede.

Vorig jaar kon de conferentie geen doorgang vinden vanwege de pandemie, maar dit jaar lijkt het te gaan lukken. Zet 19 en 20 oktober dus in je agenda om een duik te nemen in open source geo.

Tot 10 september kun je je bovendien aanmelden als je een presentatie of workshop wilt geven.

Drie jaar geleden in 2018 had ik het genoegen de openingsspreker te mogen zijn. Onder de titel “Een kaart is de mooiste van alle epische gedichten” (een quote van Gilbert Grosvenor, eerste editor van National Geographic), hield ik een betoog over hoe data een geopolitieke hoofdrol heeft gekregen en hoe geografische gegevens (vaak als ‘linking pin’ van andere gegevenssoorten) daar centraal in staan. Hoe dat betekent dat je je eigen werk met geo-data ook in het licht van die geo-politieke context moet beschouwen, en de consequenties daarvan voor de rol van geografische data in ethische vraagstukken, data-governance en databescherming. (In 2016 hield ik een presentatie voor de wereldwijde FOSS4G conferentie.)

Inmiddels zijn we een paar jaar verder, en heeft de Europese Commissie een breed pakket aan nieuwe wetten voorgesteld die de Europese geopolitieke positionering t.a.v. digitalisering en data vaste vorm geven. Tegelijk is er beleid gelanceerd, zoals de Europese Green Deal, dat nadrukkelijk bouwt op wat er m.b.t. digitalisering en data gebeurt, en zijn er financieringsinstrumenten om dat te versnellen en voort te stuwen. Wat in mijn verhaal van 2018 nog vooral een oproep was aan de geo-professional om als ‘bril’ op te zetten, krijgt nu vaste vorm in de eisen die aan de dagelijkse praktijk van die professional gesteld gaan worden. Aan een deel van die nieuwe regels werkte ik vorig jaar actief mee door voor aardobservatie, milieu en meteorologie voor de EC te onderzoeken welke gegevens in heel Europa voortaan verplicht voor iedereen vrij herbruikbaar moeten zijn. Regels, die als het goed is, later deze maand eindelijk worden gepubliceerd.

Tijd dus, wat mij betreft, voor uitleg en een update. Ik heb me als spreker aangemeld om de nieuwe Europese regels uit de doeken te doen, en in te gaan op welke kansen daarin voor de FOSS4G community verpakt zitten. Dat doe ik vanuit mijn huidige werk voor Geonovum, de overheidsstichting die de overheid beter laat werken met geo-informatie. Binnen Geonovum volg ik, in opdracht van het Ministerie van Binnenlandse Zaken, actief de Europese ontwikkelingen t.a.v. digitalisering en data, en de impact daarvan op bijv. (inter)nationale standaarden, de verdere evolutie van INSPIRE, en de beschikbaarheid van overheidsdata voor iedereen. Die help ik vertalen naar kansen en handelingsperspectief voor Nederlandse datahouders en datagebruikers.

The Irish Data Protection Authority (DPA), has issued a decision on a 2018 investigation into WhatsApps data processing. It concerned at first glance two aspects, one the uploading of WhatsApp user’s contact lists, and the retention of non-user (hashed) phone numbers, as well as the information exchange between WhatsApp and its parent company Facebook. WhatsApp argued they were not a data controller in this case, but their users were, and they were merely processing data, but that defense failed. (I think the language used by WhatsApp itself, the word ‘user’, gives away the actual locus of power quite clearly.)

The An Coimisiúm um Chosaint Sonraí, Irish Data Protection Commission, issued a fine of 225 million Euro’s. This seems right up there at the top of the potential fine range of 4% of global turnover in the last year (2020).

It is good to see the Irish DPA finally coming down with a decision. With enforcement of the GDPR starting mid 2018, a range of complaints and investigations landed on the Irish DPA’s plate, as several large tech companies maintain their EU presence in Ireland. The slow pace of the Irish DPA in handling these complaints has been itself a source of complaints. With this decision on the WhatsApp investigation there now finally is some visible movement.

Also see the earlier announcement concerning Amazon receiving a 746 million fine from the Luxembourg DPA.