Over at Netzpolitik two leaked draft texts for new EC proposals w.r.t. data and digital legislation have been published. I’ve been reading them the past days, though not yet finished. In a week the final proposal should be announced by the EC. That they have been leaked beforehand tells you there’s some differences of opinion within the EC on this, giving the outside a way to read ahead and mount criticism in time.

The EC’s goals for digital regulation this period are simplification, consistency and clarity. In consultations for the upcoming European Data Union Strategy, I and others put forward to please not merely interpret ‘simplification’ as rule slashing. Simplification can also mean making it much easier to demonstrate compliance. And it would also help if the EC would come out and say the quiet part out loud: that a lot of wat is now presented by third parties as cumbersome regulation is in reality malicious compliance by those third parties. The annoying cookie walls of the past years e.g. are not in any way required by regulation, it’s just the single most annoying way for third parties to deal with it so you might think the EC is the problem. Tracking is the problem, that adtech is fundamentally in conflict with the rules is the problem. It’s not a ‘compliance burden’ if your actions bump into the law. That’s properly called ‘illegal actions’. Simplification in short could also mean a much clearer enforcing of existing rules, as most digital regulation now has very little in the way of actual consequences for third parties, and none that rise above the ‘cost of doing business’.

There are two ‘Omnibus’ proposals in the works, meaning a proposal that makes changes to a number of existing laws at the same time.

One deals with data regulations. It amends the Data Act in such a way that the Data Governance Act, the Open Data Directive and the Free Flow of Non-Personal Data Regulation all get repealed, and mostly incorporated into the Data Act. I’m working my way through the meaning of that still, at 90 pages of text it’s not a quick read. But one thing stands out immediately to me: the Open Data rules until now were a Directive, meaning every Member State would create a national law to implement it. The entirety now gets added to a Regulation (Act), meaning it has immediate working across the EU. This is something I and others have long (like since 2008 more or less) called for, because as a directive it means there’s differences between countries in how open data gets interpreted. What can be open data is currently based on the national information access regimes and not on a unified European notion. I still need to explore how that would play out in the new Omnibus. This first Omnibus also touches the GDPR, and that is something to be careful about too.

The other Omnibus is aimed at the AI Act and the GDPR. I haven’t looked at this one at all yet. But around the web I see fears and first takes that the GDPR will get weakened to feed AI model training, a.o. by stretching the notion of ‘legitimate interest’ in ways that make Facebook’s attempt at interpretation of the term in the past years seem conservative. It used to be that legitimate should be read as ‘lawful’ (e.g. I need your name if I’m to send you an invoice, because I’m legally obliged to put that on the invoice), but we seem to shift to where the interpretation of legitimate is as ‘justifiable’, and at that in the very generic meaning of ‘well, I have my reasons, ok?’. Another step, judging by what others have posted, seems to do away with the notion that inferred data can be collection of personal data (As in, I did not ask you about your religion and stored that, but I inferred it from tracking your visits to websites of houses of worship).

In a week we will know what the proposals of the EC really are. Until then I will be reading the leaked drafts, to see what mechanisms are being created, dumped and altered.

Twenty years ago today E and I visited Reboot 7 in Copenhagen. What I wrote a decade ago at the 10th anniversary of that conference still holds true for me.

Over time Reboot 7 became mythical. A myth that can’t return. But one we were part of, participated in and shaped.
Still got the t-shirt.


The yellow t-shirt with red text from the 2005 Reboot 7 conference, on my blue reading chair in my home office 20 years on.

Seventeen years ago today I blogged about a barcamp style event in Amsterdam I co-hosted, called GovCamp_NL. I struck up a conversation there about open government data after having had a similar conversation the week before in Austria. It marked the beginning of my work in this field. We just welcomed the thirteenth team member in the company that over time grew out of that first conversation. Our work at my company is driven by the same thing as the event, something I’ve come to call constructive activism.

These days, the principles and values that drove those events, and have set the tone for the past two decades of everything I’ve done professionally and socially, seem more important than ever. They are elemental in the current geopolitical landscape around everything digital and data. We can look back on our past selves with 20 years hindsight and smile about our one time optimism, because so much exploitation, abuse and surveillance grew out of the platforms and applications that originate in the early 00’s. But not because that optimism was wrong. Naive yes, in thinking that the tech would all take care of itself, by design and by default, and we just needed to nudge it a bit. That optimism in the potential for (networked) agency, for transparency, for inclusion, for diversity, and for global connectedness is still very much warranted, as a celebration of human creativity, of the sense of wonder that wielding complexity for mutual benefit provides, just not singularly attached to the tech involved.
Anything digital is political. The optimism is highly political too.

The time to shape the open web and digital ethics is now, is every day. Time for a reboot.

This week at the EU Open Data Days in Luxembourg, Davide Taibi a senior researcher at the Institute for Educational Technology of the National Research Council of Italy, talked about his research into a possible European curriculum for data literacy.

He mentioned how, in the highly multilingual context of Europe, data literacy is an unclear term. In German data literacy translates to data competence, while literacy itself translates to alphabetisation. Other terms like information literacy and data science are used more commonly across countries.

On one of his slides (image) he wrote:

The term data literacy isn’t well known in most of the countries analysed. The most widely used terms are ‘digital literacy’, ‘information literacy’, ‘data competence’, ‘media literacy’, ‘statistical literacy’, ‘computer/IT literacy’, among others. In most countries it is closely related to digital skills.

I usually use Howard Rheingold’s shorthand for literacy as skills plus community. Skills benefit individuals, but for some when you add in the context of a community or network of skilled people in which that skill gets deployed, the value of usage sees a nonlinear effect, a kind of network effect basically. That communal aspect, and the jump in usage value is connected to my notion of networked agency. It works as a multiplier.

Looping back to the lack of clarity around data literacy as a term, I wonder.
Is it because we haven’t yet described clearly enough which _skills_ we mean when talking about data literacy?
Or is it because we don’t really know which communities would see which non linear use value, when deploying the data skills concerned?

The period of the European Commission that has just finished delivered an ambitious and coherent legal framework for both the single digital market and the single market for data, based on the digital and data strategies the EU formulated. Those laws, such as the Data Governance Act, Data Act, High Value Data implementing regulation and the AI Act are all finished and in force (if not always fully in application). This means efforts are now switching to implementation. The detailed programme of the next European Commission, now being formed, isn’t known yet. Big new legislation efforts in this area are however not expected.

This summer Ursula von der Leyen, the incoming chairperson of the Commission has presented the political guidelines. In it you can find what the EC will pay attention to in the coming years in the field of data and digitisation.

Data and digital are geopolitical in nature
The guidelines underline the geopolitical nature of both digitisation and data. The EU will therefore seek to modernise and strengthen international institutions and processes. It is noted that outside influence in regular policy domains has become a more common instrument in geopolitics. Data and transparency are likely tools to keep a level headed view of what’s going on for real. Data also is crucial in driving several technology developments, such as in AI and digital twins.

European Climate Adaptation Plan Built on Data
The EU will increase their focus on mapping risks and preparedness w.r.t. natural disasters and their impact on infrastructure, energy, food security, water, land use both in cities and in rural areas, as well as early warning systems. This is sure to contain a large data component, a role for the Green Deal Data Space (for which the implementation phase will start soon, now the preparatory phase has been completed) and the climate change digital twin of the earth (DestinE, for which the first phase has been delivered). Climate and environment are the areas where already before the EC emphasised the close connection between digitisation and data and the ability to achieve European climate and environmental goals.

AI trained with data
Garbage in, garbage out: access to enough high quality data is crucial to all AI development, en therefore data will play a role in all AI plans from the Commission.

An Apply AI Strategy was announced, aimed at sectoral AI applications (in industry, public services or healthcare e.g.). The direction here is towards smaller models, squarely aimed at specific questions or tasks, in the context of specific sectors. This requires the availability and responsible access to data in these sectors, in which the European common data spaces will play a key role.

In the first half of 2025 an AI Factories Initiative will be launched. This is meant to provide SME’s and newly starting companies with access to the computing power of the European supercomputing network, for AI applications.

There will also be an European AI Research Council, dubbed a ‘CERN for AI’, in which knowledge, resources, money, people, and data.

Focus on implementing data regulations
The make the above possible a coherent and consistent implementation of the existing data rules from the previous Commission period is crucial. Useful explanations and translations of the rules for companies and public sector bodies is needed, to allow for seamless data usage across Europe and at scale. This within the rules for data protection and information security that equally apply. The directorate within the Commission that is responsible for data, DG Connect, sees their task for the coming years a mainly being ensuring the consistent implementation of the new laws from the last few years. The implementation of the GDPR until 2018 is seen as an example where such consistency was lacking.

European Data Union
The political guidelines announce a strategy for a European Data Union. Aimed at better and more detailed explanations of the existing regulations, and above all the actual availability and usage of data, it reinforces the measure of success the data strategy already used: the socio-economic impact of data usage. This means involving SME’s at a much larger volume, and in this context also the difference between such SME’s and large data users outside of the EU is specifically mentioned. This Data Union is a new label and a new emphasis on what the European Data Strategy already seeks to do, the creation of a single market for data, meaning a freedom of movement for people, goods, capital and data. That Data Strategy forms a consistent whole with the digital strategy of which the Digital Markets Act, Digital Services Act and AI Act are part. That coherence will be maintained.

My work: ensuring that implementation and normalisation is informed by good practice
In 2020 I helped write what is now the High Value Data implementing regulation, and in the past years my role has been tracking and explaining the many EU digital and data regulations initiatives on behalf of the main Dutch government holders of geo-data. Not just in terms of new requirements, but with an accent on the new instruments and affordances those rules create. The new instruments allow new agency of different stakeholder groups, and new opportunities for societal impact come from them.
The phase shift from regulation to implementation provides an opportunity to influence how the new rules get applied in practice, for instance in the common European data spaces. Which compelling cases of data use can have an impact on implementation process, can help set the tone or even have a normalisation effect? I’m certain practice can play a role like this, but it takes bringing those practical experiences to a wider European network. Good examples help keep the actual goal of socio-economic impact in sight, and means you can argue from tangible experience in your interactions.

My work for Geonovum the coming time is aimed at this phase shift. I already helped them take on a role in the coming implementation of the Green Deal Data Space, and I’m now exploring other related efforts. I’m also assisting the Ministry for the Interior in formulating guidance for public sector bodies and data users on how to deal with the chapter of the Data Governance Act that allows for the use (but not the sharing) of protected data held by the public sector. Personally I’m also seeking ways to increase the involvement of civil society organisations in this area.

Juni is een goede maand voor open data dit jaar.

Ten eerste keurde vorige week dinsdag 4 juni de Eerste Kamer de wet goed die de Europese open data richtlijn implementeert in de Nederlandse Wet Hergebruik Overheidsinformatie. Al is de wet nog niet gepubliceerd en dus nog niet van kracht komt daarmee een einde aan drie jaar vertraging. De wet had al per juli 2021 in moeten gaan. De Europese richtlijn ging namelijk in juli 2019 in en gaf Lidstaten twee jaar de tijd voor omzetting in nationale wetgeving.

Ten tweede ging afgelopen zondag 9 juni de verplichting voor het actief publiceren door overheden via API’s van belangrijke data op zes thema’s in. Die Europese verordening werd eind 2022 aanvaard, werd begin februari 2023 van kracht, en gaf overheden 16 maanden d.w.z. tot zondag om er aan te voldoen. De eerste rapportage over de implementatie moeten Lidstaten in februari 2025 doen, dus ik neem aan dat veel landen die periode nog gebruiken om aan de verplichtingen te voldoen. Maar het begin is er. In Nederland is de impact van deze High Value Data verordening relatief gering, want het merendeel van de data die er onder valt was hier al open. Tegelijkertijd was dat in andere EU landen niet altijd het geval. Nu kun je dus Europees dekkende datasets samenstellen.

ODRL, Open Digital Rights Language popped up twice this week for me and I don’t think I’ve been aware of it before. Some notes for me to start exploring.

Rights Expression Languages

Rights Expression Languages, RELs, provide a machine readable way to convey or transfer usage conditions, rights, restraints, granularly w.r.t. both actions and actors. This can then be added as metadata to something. ODRL is a rights expression language, and seems to be a de facto standard.

ODRL is a W3C recommendation since 2018, and thus part of the open web standards. ODRL has its roots in the ’00s and Digital Rights Management (DRM): the abhorred protections media companies added to music and movies, and now e-books, in ways that restrains what people can do with media they bought to well below the level of what was possible before and commonly thought part of having bought something.

ODRL can be expressed in JSON or RDF and XML. A basic example from Wikipedia looks like this:


{
"@context": "http://www.w3.org/ns/odrl.jsonld",
"uid": "http://example.com/policy:001",
"permission": [{
"target": "http://example.com/mysong.mp3",
"assignee": "John Doe",
"action": "play"
}]
}

In this JSON example a policy describes that example.com grants John permission to play mysong.

ODRL in the EU Data Space

In the shaping of the EU common market for data, aka the European common data space, it is important to be able to trace provenance and usage conditions for not just data sets, but singular pieces of data, as it flows through use cases, through applications and their output back into the data space.
This week I participated in a webinar by the EU Data Space Support Center (DSSC) about their first blueprint of data space building blocks, and for federation of such data spaces.

They propose ODRL as the standard to describe usage conditions throughout data spaces.

The question of enactment

It wasn’t the first time I talked about ODRL this week. I had a conversation with Pieter Colpaert. I reached out to get some input on his current view of the landscape of civic organisations active around the EU data spaces. We also touched upon his current work at the University of Gent. His research interest is on ODRL currently, specifically on enactment. ODRL is a REL, a rights expression language. Describing rights is one thing, enacting them in practice, in technology, processes etc. is a different thing. Next to that, how do you demonstrate that you adhere to the conditions expressed and that you qualify for using the things described?

For the EU data space(s) this part sounds key to me, as none of the data involved is merely part of a single clear interaction like in the song example above. It’s part of a variety of flows in which actors likely don’t directly interact, where many different data elements come together. This includes flows through applications that tap into a data space for inputs and outputs but are otherwise outside of it. Such applications are also digital twins, federated systems of digital twins even, meaning a confluence of many different data and conditions across multiple domains (and thus data spaces). All this removes a piece of data lightyears from the neat situation where two actors share it between them in a clearly described transaction within a single-faceted use case.

Expressing the commons

It’s one thing to express restrictions or usage conditions. The DSSC in their webinar talked a lot about business models around use cases, and ODRL as a means for a data source to stay in control throughout a piece of data’s life cycle. Luckily they stopped using the phrase ‘data ownership’ as they realised it’s not meaningful (and confusing on top of it), and focused on control and maintaining having a say by an actor.
An open question for me is how you would express openness and the commons in ODRL. A shallow search surfaces some examples of trying to express Creative Commons or other licenses this way, but none recent.

Openness, can mean an absence of certain conditions, although there may be some (like adding the same absence of conditions to re-shared material or derivative works), which is not the same as setting explicit permissions. If I e.g. dedicate something to the public domain, an image for instance, then there are no permissions for me to grant, as I’ve removed myself from that role of being able to give permission. Yet, you still want to express it to ensure that it is clear for all that that is what happened, and especially that it remains that way.

Part of that question is about the overlap and distinction between rights expressed in ODRL and authorship rights. You can obviously have many conditions outside of copyright, and can have copyright elements that may be outside of what can be expressed in RELs. I wonder how for instance moral authorship rights (that an author in some (all) European jurisdictions cannot do away with) can be expressed after an author has transferred/sold the copyrights to something? Or maybe, expressing authorship rights / copyrights is not what RELs are primarily for, as it those are generic and RELs may be meant for expressing conditions around a specific asset in a specific transaction. There have been various attempts to map all kinds of licenses to RELs though, so I need to explore more.

This is relevant for the EU common data spaces as my government clients will be actors in them and bringing in both open data and closed and unsharable but re-usable data, and several different shades in between. A range of new obligations and possibilities w.r.t. data use for government are created in the EU data strategy laws and the data space is where those become actualised. Meaning it should be possible to express the corresponding usage conditions in ODRL.

ODRL gaps?

Are there gaps in the ODRL standard w.r.t. what it can cover? Or things that are hard to express in it?
I came across one paper ‘A critical reflection on ODRL’ (PDF Kebede, Sileno, Van Engers 2020), that I have yet to read, that describes some of those potential weaknesses, based on use cases in healthcare and logistics. Looking forward to digging out their specific critique.