This week the draft implementation act (PDF) and annex listing the first batch of European High Value Data sets (PDF) has finally been published. In the first half of 2020 I was involved in preparatory research to advise on what data, spread across six predetermined themes, should be put on this mandatory list. It’s the first time open data policy makes the publication of certain data mandatory through an API. Until now European open data policy built upon the freedom of information measures of each EU Member State (MS), and added mandatory conditions to what MS published voluntarily, and to how to respond to public data re-use requests. This new law arranges for the pro-active publication of certain data sets.

In the 2020 research I was responsible for the sections about earth observation, environmental, and meteorological data. We submitted our final report in September 2020, and since then there had been total silence w.r.t. the progress in negotiating the list with the MS, and putting together the implementation act. I knew that at least the earth observation and environmental data would largely be included the way I suggested, when last summer I got a sneak preview of the adaptation of the INSPIRE portal where such data is made available.

The Implementation Act

In the Open Data Directive there’s a provision that the European Commission can, through a separate implementation act, set mandatory open data requirements for data belonging to themes listed in the Directive’s Annex. At launch in 2019, 6 such themes were listed: Geo-information, statistics, mobility, company information, earth observation / environment, and meteorology.
The list of themes can also be amended, through another separate implementation act, and a process to determine the second set of themes is currently underway.

The draft implementation law (PDF) states that government-held datasets mentioned in its Annex must be published through APIs, under an open license such as Creative Commons Zero, By Attribution or equivalent / less restrictive. Governments must publish the terms of use for such APIs and these terms may not be used to discourage re-use. APIs must also be fully publicly documented, and a point of contact must be provided.

MS can temporarily exempt some of the high value datasets, a decision that must be made public, but limited to two years after entry into force of this implementation act. Additional usage restrictions are allowed for personal data within the data sets concerned, but only to the extent needed to protect personal data of individuals (so not as an excuse to disallow re-use and access to the data as a whole).

MS must report on their implementation actions every two years, in which they need to list the actual data sets opened, the links to licenses, API and documentation, and exemptions still in place. The implementation is immediately binding for all MS (no need to first transpose into national law to be enforcable), will apply 20 days after publication in the EU Journal, and MS have 6 months to comply.

The Data Sets Per Theme

In this first batch of mandatory open data, 6 themes are covered (PDF). Some brief remarks on all of them.

Mobility

This is, contrary to what you’d expect, the smallest theme of the six covered. Because everything that is already covered in the Intelligent Transport (ITS) Directive is out of scope, which is most of everything concerning land based mobility. What remains for the High Value Data list is data on transport networks contained in the INSPIRE Annex I theme Transport Networks, and static and dynamic data about inland waterways, as well as the electronic navigational charts (ENC) for inland waterways. This is much in line with the 2020 study report. There was some concern with national hydrographical services about ENCs for seas being included (making it harder to force sea going vessels to use the latest version), but my reassurances that it would be unlikely held true.

Geospatial data

Geospatial data is I would say the ‘original’ high value government data, and has been for centuries. The data sets from the four INSPIRE Annex I themes Administrative Units, Geographical Names, Addresses, Buildings and Cadastral Parcels are within scope. Additionally reference parcels and agricultural parcels as described in the 1306/2013 and 1307/2013 Regulations on the Common Agricultural Policy (CAP) are on the list.

Earth Observation and Environment

This was a theme I was responsible for in the 2020 study. It is an extremely broad category, covering a very wide spectrum of types of data. It was basically impossible to choose something from this list, not in the least because re-use value usually comes from combinations of data, not from any single source used. Therefore my proposed solution was to not choose, and advise to treat it as a coherent whole needed in addressing the EU goals concerning environment/nature, climate adaptation, and pollution. The High Value Data list adopts this approach and puts 19 INSPIRE themes within scope. These are:

  • Annex I: Hydrography, and Protected Sites
  • Annex II in full: Elevation, Geology, Land Cover, and Ortho-imagery
  • Annex III: Area management, Bio-geographical regions, Energy resources, Environmental monitoring facilities, Habitats and biotopes, Land use, Mineral resources, Natural risk zones, Oceanographic geographical features, Production and industrial facilities, Sea regions, Soil, and Species distribution

Additionally all environmental information as covered by the 2003/4/EC Directive on public access to such information is added to the list, and all data originating in the context of a wide range of EU Regulations and Directives on air, climate, emissions, nature preservation and biodiversity, noise, waste and water. I miss soil in this environmental list, but perhaps the Annex III INSPIRE theme is seen as sufficiently covering it. I still need to follow up on the precise formulations w.r.t. data in 31 additionally referenced regulations and directives.

What to me is a surprising phrasing is that earth observation is defined here including satellite based data. Not surprising in terms of earth observation itself, but because satellite data was specifically excluded from the scope of our 2020 study. First because the EU level satellite data is already open. Second because this list deals with data from MS, and not many MS have their own satellite data. When they do it is usually the result of public private collaborative investment, and such private investment may dry up if there are no longer temporary exclusive access arrangements possible, which would have resulted in considerable political objections. Perhaps adding space based data collection is currently being well enough watered down by defining the INSPIRE themes as its scope, while at the same time future proofing the definition for when satellite data does become part of INSPIRE themes.

Together these first three, mobility, geospatial, and EO/environment, place a full 24 out of 34 INSPIRE themes on the list for mandatory open data. This basically amounts to adding an open data requirement to INSPIRE. It places MS’ INSPIRE compliance very much in the focus of attention, which now often is limited, and further positions INSPIRE as a key building block in the coming Green Deal dataspace. It will be of high interest to see what the coming new version of the INSPIRE directive, currently under review, makes of all that.

Statistics

This topic is more widely covered in the High Value Data list, than it was in the 2020 study, both in the types of statistics included, and in the demands made of those types of statistics. Still there are lots of statistics that MS hold, that aren’t included here (while some MS do publish most of their statistics already btw): the selection is based on European reporting obligations that follow from a list of various European laws.
Topics for which statistics must be published as open data in a specified way:

  • Industrial production
  • Industrial producer price index, by activity
  • Volume of sales by activity
  • EU international trade in goods
  • Tourism flows in Europe
  • Harmonised consumer prices indices
  • National accounts: GDP, key indicators on corporations and households
  • Government expenditure and revenue, government gross debt
  • Population, fertility, mortality
  • Current healthcare expenditure
  • Poverty
  • Inequality
  • Employment, unemployment, potential labour force

Data for these reporting obligations should be available from the moment the law creating them has been in force. That means for instance that healthcare expenditure should be available from at least 2008, whereas employment statistics must be available from at least 2019, because of the different years in which these laws were enacted.

Company information

Company information from the start has been the most controversial theme of the six covered by this implementation act. I assume this theme has also been the prime political reason for the long delay in the proposal being published. In my perception because this is the only data set that actually might end up challenging the status quo in society (as it involves ownership and power structures, and touches tax evasion). In the 2020 study four aspects were considered, the basic company information, company documents and accounts, ownership information, and insolvency status. Two ended up in the draft law: basic company information and company documents. Opening ownership information, not even the ultimate beneficial ownership (UBO) information, from the start drew vehement objections (including from the Dutch government). Many stakeholders (including the NGO I chair) are disappointed with the current outcome. (Here’s an old blogpost where I explain UBO, and here’s SF writer Brin on what transparent UBO might mean to our societies.) The data that will become open data still may be 2 years in the future: the Open Data Directive allows a 2 year exemption, and this is the data where that exemption will be used I think.
That said, mandatory open company data and documents, even with the delay through exemptions, is already a step forward that puts an end to literally decades of court cases, obstruction, and lobbying for more openness. The very first PSI Directive in 2003 was already an expression of a broad demand for this data, now 20 years on it finally becomes mandatory across the EU. Some people I know have been after this for their entire professional careers and already retired. It’s easy to loose sight of that win when we only focus on not having (ultimate) ownership data included.

Meteorological data

This is the other theme I was responsible for in the 2020 study. Like with company information this is an area where the discussion about making it available for re-use is decades old and precedes digitisation becoming ubiquitous. When I started my open data work in 2008, most of the existing documentation and argumentation for the value of and need for open data concerned meteorological data. A range of EU countries already have this as open data, others not at all. While progress has been made in the past decades, the High Value Data list provides a blanket obligation for all EU MS, a result that would otherwise still be a very long time away if entirely voluntary for the MS involved.
Data included here includes all weather observation data, validated observations / climate data, radar data (useful for things like cloud heights, precipitation and wind), and numerical weather prediction data (these are the outputs of the combined models used for predictions).

The implementation act is up for public feedback until 21 June, but likely will retain its current form. I think it’s a pretty good result, and I am happy that I have been able to contribute to it.

It’s finally here, published today: the proposal for the EU High Value Data list. The list for the first time makes open data publication mandatory for government concerning (for now) 6 themes (geographic information, meteorology, mobility, statistics, earth observation and environment, and company information). Already in September 2020 an impact assessment and advice on policy options was delivered to the European Commission. I was part of that assessment team, and responsible for the themes Meteorology, Earth Observation and environment. Now we get to see what has been proposed to be implementend in law. I haven’t read it yet, will do that tomorrow first thing, but wanted to share the link here. There’s a window for feedback on the proposal open until 21 June 2022.

Public Spaces is an effort to reshape the internet experience towards a much larger emphasis on, well, public spaces. Currently most online public debate is taking place in silos provided by monopolistic corporations, where public values will always be trumped by value extraction regardless of externalised costs to communities, ethics, and society. Today the Public Spaces 2022 conference took place. I watched the 2021 edition online, but this time decided to be in the room. This to have time to interact with other participants and see who sees itself as part of this effort. Public Spaces is supported by 50 or so organisations, one of which I’m a board member of. Despite that nominal involvement I am still somewhat unclear about what the purpose of Public Spaces as a movement, not as an intention, is. This first day of the 2-day conference didn’t make that clearer to me, but the actual sessions and conversations were definitely worthwile to me.

Some first observations that I jotted down on the way home, below the photo taken just before the start of the conference.

a) In the audience and on stage there were some known faces, but mostly people unknown to me. Good thing, as it demonstrates how many new entrants into these discussions there are. At the same time there was also a notable absence of faces, e.g. from the organisations part of the Public Spaces effort. Maybe it’s because they rather show up tomorrow when the deputy minister is also present. As an awareness raising exercise, despite this still being a rather niche and like minded audience, this conference is certainly valuable.

b) That value was I think mostly expressed by the attention given to explaining some of the newly agreed European laws, the Digital Markets Act, Digitals Services Act and Data Governance Act. For most of the audience this looks like the first actual encounter with what those laws say, and one panel moderator upon hearing its contents showed themselves surprised this was already decided regulation and not stuck somewhere in a long and slow pipeline of debate and lobbying.

c) It was very good to hear people on stage actually speaking enthusiastically about the things these new laws deliver, despite being cautious about the pace of implementation and when we’ll see the actual impact of these rules. Lotje Beek of Bits of Freedom was enthusiastic about the Digital Services Act and I applaud the work BoF has done in the past years on this. (Disclosure: I’m on the board of an NGO that joins forces with BoF and Waag, organiser of this conference, in the so-called Digital Four, which lobbies the Dutch government on digital affairs.)
Similarly Kim van Sparrentak, MEP for the Greens, talked with energy about the Digital Markets Act. This was very important I think, and helps impress on the audience to engage with these new laws and the tools they provide.

d) The opening talk by Miriam Rasch I enjoyed a lot. Her earlier book Friction E felt seemingly lacked some deeper understanding of the technologies involved to build the conclusions and arguments on, so I was interested in hearing her talk in person. The focus today was more on her second book Autonomy. I‘ll buy bought it and will read it, also to clarify whether some of the things I think I heard are my misunderstanding or parts of the ideas expressed in the book. Rasch positions autonomy as the key thing to guard and strengthen. She doesn’t mean autonomy in the sense of being fully disconnected from everyone else in your decisions, but in a more interdependent way. To make your own choices, within the network of relationships around you. Also as an emotionally rooted thing, which I thought is a useful insight. She does position it as something exclusively individual. At the same time it seems she equates autonomy with agency, and I think agency does not merely reside on the individual level but also in groups of relationships in a given context (I call it networked agency). It seemed a very westernised individualistic viewpoint, that I think sets you up for less autonomy because it pits you individually against the much bigger systems and structures that erode your autonomy, dumping you in a very assymetric power struggle. A second thing that stood out to me is how she expresses the me-against-the-system issue as one of autonomy versus automation. It’s a nice alliteration, but I don’t accept that juxtaposition. It’s definitely the case that automation is frequently used to dehumanise lots of decisions, and thus eroding the autonomy of those being decided about. But to me it’s not inherent in automation. When you have the logic of (corporate) bureaucracies doing the automation, you’ll end up with automation that mimics that logic. If I do the automation it will mimic my logic. I use automation a lot for my own purposes (personal software), and it increases my agency, it’s a direct expression of my autonomy (or that of the groups I’m part of). There’s more to be said, in a separate post, a.o. about the 3 or 4 thinking exercises she took us through to explore autonomy as a concept for ourselves. After all it wouldn’t make us more autonomous if she would prescribe us her definition of autonomy, precisely because she underscores that it’s not a purely rational concept but an emotional one as well.

e) Prof. Tamar Sharon of Radboud University spoke about the influence tech companies have in other domains than tech itself because of their technology being expanded into or used in those domains such as health, education, spatial planning, media. She calls it sphere transgressions. This may bring value, but may also be problematic. She showed a very cool tool that visualises how various tech companies are influential in domains you don’t immediately associate them with. A good thinking aid I think also in the upcoming discussion about sectoral European data spaces and being alert to the pitfall of it turning into a tech dominated discussion, rather than a societal benefit and impact discussion.

f) Kudos to the conference organisers. Every panel composition was nicely balanced, it shows good care in curating the program and having tapped into a high quality network. I know from experience that it takes deliberate effort to make it so. Also the catering was fully vegetarian and vegan, no words wasted on it, just by default. That’s the way to go.

Bookmarked Data altruism: how the EU is screwing up a good idea (by Winfried Veil)

I find this an unconvincing critique of the data altruism concept in the new EU Data Governance Act (caveat: the final consolidated text of the new law has not been published yet).

“If the EU had truly wanted to facilitate processing of personal data for altruistic purposes, it could have lifted the requirements of the GDPR”
GDPR slackened for common good purposes? Let’s loosen citizen rights requirements? It asumes common good purposes can be well enough defined to not endanger citizen rights, turtles all the way down. The GDPR is a foundational block, one in which the author, some googling shows, is disappointed with having had some first hand experience in its writing process. The GDPR is a quality assurance instrument, meaning, like with ISO style QA systems, it doesn’t make anything impossible or unallowed per se but does require you organise it responsibly upfront. That most organisations have implemented it as a compliance checklist to be applied post hoc is the primary reason for it being perceived as “straight jacket” and for the occurring GDPR related breaches to me.
It is also worth noting that data altruism also covers data that is not covered by the GDPR. It’s not just about person identifiable data, but also about otherwise non-public or confidential organisational data.

The article suggests it makes it harder for data altruistic entities to do something that already now can be done under the GDPR by anyone, by adding even more rules.
The GDPR pertains to the grounds for data collection in the context of usage specified at the time of collection. Whereas data altruism is also aimed at non-specified and at not yet known future use of data collected here and now. As such it covers an unaddressed element in the GDPR and offers a path out of the purpose binding the GDPR stipulates. It’s not a surprise that a data altruism entity needs to comply with both the GDPR and a new set of rules, because those additional rules do not add to the GDPR responsibilities but cover other activities. The type of entities envisioned for it already exist in the Netherlands, common good oriented entities called public benefit organisations: ANBI‘s. These too do not absolve you from other legal obligations, or loosen the rules for you. On the contrary these too have additional (public) accountability requirements, similar to those described in the DGA (centrally registered, must publish year reports). The DGA creates ANBI’s for data, Data-ANBI’s. I’ve been involved in data projects that could have benefited from that possibility but never happened in the end because it couldn’t be made to work without this legal instrument.

To me the biggest blind spot in the criticism is that each of the examples cited as probably more hindered than helped by the new rules are single projects that set up their own data collection processes. That’s what I think data altruism is least useful for. You won’t be setting up a data altruism entity for your project, because by then you already know what you want the data for and start collecting that data after designing the project. It’s useful as a general purpose data holding entity, without pre-existing project designs, where later, with the data already collected, such projects as cited as example will be applicants to use the data held. A data altruistic entity will not cater to or be created for a single project but will serve data as a utility service to many projects. I envision that universities, or better yet networks of universities, will set up their own data altruistic entities, to cater to e.g. medical or social research in general. This is useful because there currently are many examples where handling the data requirements being left to the research team is the source of not just GDPR breaches but also other ethical problems with data use. It will save individual projects such as the examples mentioned a lot of time and hassle if there’s one or more fitting data altruistic entities for them to go to as a data source. This as there will then be no need for data collection, no need to obtain your own consent or other grounds for data collection for each single respondent, or create enough trust in your project. All that will be reduced to guaranteeing your responsible data use and convince an ethical board of having set up your project in a responsible way so that you get access to pre-existing data sources with pre-existing trust structures.

It seems to me sentences cited below require a lot more thorough argumentation than the article and accompanying PDF try to provide. Ever since I’ve been involved in open data I’ve seen plenty of data innovations, especially if you switch your ‘only unicorns count’ filter off. Barriers that unintentionally do exist typically stem more from a lack of a unified market for data in Europe, something the DGA (and the GDPR) is actually aimed at.

“So long as the anti-processing straitjacket of the GDPR is not loosened even a little for altruistic purposes, there will be little hope for data innovations from Europe.” “In any case, the EU’s bureaucratic ideas threaten to stifle any altruism.”

Winfried Veil

Three years ago I picked up some books when we were staying a few days in Freiburg, Germany, on our way to Switzerland for New Years Eve. I tend to read in the evening in bed before going to sleep and then prefer using an e-reader. Paper books get to be ignored easily. As happened to this one for two years. Early last year however I had my reading moments earlier in the evening.

I then read Robert Menasse’s Die Hauptstadt, or The Capital. Menasse is an Austrian author, whose work I enjoy. I have most of his novels.

Thoroughly enjoyed this book as well, created as an European novel: it is set in Brussels, although the titular capital refers not merely to the seat of the EU civil service, but also to the one forever darkest spot in our European history. A European novel in the sense that it’s playing with the historic layers and multiple meanings present in every piece of this continent as well as in its individual citizens. Usually shaped as contradictions, paradoxes and ironic coincidences, but put together forming an enormous wealth of humanity with its abundant variety, interconnectedness and potential serendipity. It’s a 450 pages sized version of what E and I mean when we say Europe works. I read it in the original German, but there’s an English translation.

(I read this book a year ago, right about when I last posted something in my books feed. Having at some point last year moved all never finished draft posts in my site to these notes, and now since this weekend having a way of easily finding any draft posts as well as posting from my notes directly to my site, this draft was an unlikely yet sudden candidate for posting.)

Bookmarked Meta’s failed Giphy deal could end Big Tech’s spending spree (by Ars Technica)

This is indeed a very interesting decision by the UK competition and markets authority. I recognise what Ars Technica writes. It’s not just a relevant decision in its own right, it’s also part of an emergent pattern. A pattern various components of which are zeroing in on large silo’d market players. In the EU the Digital Markets Act was approved in recent weeks by both the council of member state ministers and the European Parliament, with the negotation of a final shared text to be finished by next spring. The EU ministers also agreed the Digital Services Act between the member states (the EP still needs to vote on it in committee). The DMA and DSA make requirements w.r.t. interoperability, service neutrality and portability, democratic control and disinformation. On top of the ongoing competition complaints and data protection complaints this will lead to new investigations of FB et al, if not to immediate changes in functionality and accessibility of their platforms. And then there’s also the incoming AI Regulation which classifies manipulation of people’s opinion and sentiment as high risk and a to a certain extent prohibited application. This has meaning for algorithmic timelines and profile based sharing of material in those timelines. All of these, the competition issues, GDPR issues, DMA and DSA issues, and AI risk mitigation will hit FB and other big platforms simultaneously in the near future. They’re interconnected and reinforce each other. That awareness is already shining through in decisions made by competent authorities and judges here and now. Not just within the EU, but also outside it as the European GDPR, DMA, DSA and AI acts are also deliberate export vehicles for the norms written down within them.

….the strange position taken by Britain’s competition watchdog in choosing to block Meta’s takeover of GIF repository Giphy. Meta, the UK’s Competition and Markets Authority (CMA) ruled, must now sell all the GIFs—just 19 months after it reportedly paid $400 million for them. It’s a bold move—and a global first. ……regulators everywhere will now be on high alert for what the legal world calls “killer acquisitions”—where an established company buys an innovative startup in an attempt to squash the competition it could pose in the future.

Morgan Meaker, wired.com / Ars Technica