A final draft of the European AI Regulation is circulating (here’s an almost 900 page PDF). The coming days I will read it with curiosity.

With this the ambitious legal framework for everything digital and data that the European Commission set out to create in 2020 has been finished within this Commission period. That’s pretty impressive.
In 2020 there was no Digital Markets Act, Digital Services Act, AI Regulation, Data Governance Act, Data Act, nor an Open Data Directive/High Value Data implementing regulation.
Before the European elections coming spring, they are all in place. I’ve closely followed the process (and helped create a very small part of it), and I think the result is remarkably consistent and level headed. DG CNECT has done well here in my opinion. It’s a set of laws that are very useful in themselves that which simultaneously forms a geo-political proposition.

The coming years will be dedicated to implementing these novel instruments.

Favorited EDPB Urgent Binding Decision on processing of personal data for behavioural advertising by Meta by EDPB

This is very good news. The European Data Protection Board, at the request of the Norwegian DPA, has issued a binding decision instructing the Irish DPA and banning the processing of personal data for behavioural targeting by Meta. Meta must cease processing data within two weeks. Norway already concluded a few years ago that adtech is mostly illegal, but European cases based on the 2018 GDPR moved through the system at a glacial pace, in part because of a co-opted and dysfunctional Irish Data Protection Board. Meta’s ‘pay for privacy‘ ploy is also torpedoed with this decision. This is grounds for celebration, even if this will likely lead to legal challenges first. And it is grounds for congratulations to NOYB and Max Schrems whose complaints filed the first minute the GDPR enforcement started in 2018 kicked of the process of which this is a result.

…take, within two weeks, final measures regarding Meta Ireland Limited (Meta IE) and to impose a ban on the processing of personal data for behavioural advertising on the legal bases of contract and legitimate interest across the entire European Economic Area (EEA).

European Data Protection Board

In discussions about data usage and sharing and who has a measure of control over what data gets used and shared how, we easily say ‘my data’ or get told about what you can do with ‘your data’ in a platform.

‘My data’.

While it sounds clear enough, I think it is a very imprecise thing to say. It distracts from a range of issues about control over data, and causes confusion in public discourse and in addressing those issues. Such distraction is often deliberate.

Which one of these is ‘my data’?

  • Data that I purposefully collected (e.g. temperature readings from my garden), but isn’t about me.
  • Data that I purposefully collected (e.g. daily scale readings, quantified self), that is about me.
  • Data that is present on a device I own or external storage service, that isn’t about me but about my work, my learning, my chores, people I know.
  • Data that describes me, but was government created and always rests in government databases (e.g. birth/marriage registry, diploma’s, university grades, criminal records, real estate ownership), parts of which I often reproduce/share in other contexts while not being the authorative source (anniversaries, home address, CV).
  • Data that describes me, but was private sector created and always rests in private sector databases (e.g. credit ratings, mortgage history, insurance and coverage used, pension, phone location and usage, hotel stays, flights boarded)
  • Data that describes me, that I entered into my profiles on online platforms
  • Data that I created, ‘user generated content’, and shared through platforms
  • Data that I caused to be through my behaviour, collected by devices or platforms I use (clicks through sites, time spent on a page, how I drive my car, my e-reading habits, any IoT device I used/interacted with, my social graphs), none of which is ever within my span of control, likely not accessible to me, and I may not even be aware it exists.
  • Data that was inferred about me from patterns in data that I caused to be through my behaviour, none of which is ever within my span of control, and which I mostly don’t know about or even suspect exists. Which may say things I don’t know about myself (moods, mental health) or that I may not have made explicit anywhere (political or religious orientation, sexual orientation, medical conditions, pregnancy etc)

Most of the data that holds details about me wasn’t created by me, and wasn’t within my span of control at any time.
Most of the data I purposefully created or have or had in my span of control, isn’t about me but about my environment, about other people near me, things external and of interest to me.

They’re all ‘my data’. Yet, whenever someone says ‘my data’, and definitely when someone says ‘your data’, that entire scope isn’t what is indicated. My data as a label easily hides the complicated variety of data we are talking about. And regularly, specifically when someone says ‘your data’, hiding parts of the list is deliberate.
The last bullets, data that we created through our behaviour and what is inferred about us, is what the big social media platforms always keep out of sight when they say ‘your data’. Because that’s the data their business models run on. It’s never part of the package when you click ‘export my data’ in a platform.

The core issues aren’t about whether it is ‘my data’ in terms of control or provenance. The core issues are about what others can/cannot will/won’t do with any data that describes me or is circumstantial to me. Regardless in whose span of control such data resides, or where it came from.

There are also two problematic suggestions packed into the phrase ‘my data’.
One is that with saying ‘my data’ you are also made individually responsible for the data involved. While this is partly true (mostly in the sense of not carelessly leaving stuff all over webforms and accounts), almost all responsibility for the data about you resides with those using it. It’s other’s actions with data that concern you, that require responsibility and accountability, and should require your voice being taken into account. "Nothing about us, without us" holds true for data too.
The other is that ‘my data’ is easily interpreted and positioned as ownership. That is a sleight of hand. Property claims and citizen rights are very different things and different areas of law. If ‘your data’ is your property, all that is left is to haggle about price, and each context is framed as merely transactional. It’s not in my own interest to see my data or myself as a commodity. It’s not a level playing field when I’m left to negotiating my price with a global online platform. That’s so asymmetric that there’s only one possible outcome. Which is the point of the suggestion of ownership as opposed to the framing as human rights. Contracts are the preferred tool of the biggest party, rights that of the individual.

Saying ‘my data’ and ‘your data’ is too imprecise. Be precise, don’t let others determine the framing.

ODRL, Open Digital Rights Language popped up twice this week for me and I don’t think I’ve been aware of it before. Some notes for me to start exploring.

Rights Expression Languages

Rights Expression Languages, RELs, provide a machine readable way to convey or transfer usage conditions, rights, restraints, granularly w.r.t. both actions and actors. This can then be added as metadata to something. ODRL is a rights expression language, and seems to be a de facto standard.

ODRL is a W3C recommendation since 2018, and thus part of the open web standards. ODRL has its roots in the ’00s and Digital Rights Management (DRM): the abhorred protections media companies added to music and movies, and now e-books, in ways that restrains what people can do with media they bought to well below the level of what was possible before and commonly thought part of having bought something.

ODRL can be expressed in JSON or RDF and XML. A basic example from Wikipedia looks like this:


{
"@context": "http://www.w3.org/ns/odrl.jsonld",
"uid": "http://example.com/policy:001",
"permission": [{
"target": "http://example.com/mysong.mp3",
"assignee": "John Doe",
"action": "play"
}]
}

In this JSON example a policy describes that example.com grants John permission to play mysong.

ODRL in the EU Data Space

In the shaping of the EU common market for data, aka the European common data space, it is important to be able to trace provenance and usage conditions for not just data sets, but singular pieces of data, as it flows through use cases, through applications and their output back into the data space.
This week I participated in a webinar by the EU Data Space Support Center (DSSC) about their first blueprint of data space building blocks, and for federation of such data spaces.

They propose ODRL as the standard to describe usage conditions throughout data spaces.

The question of enactment

It wasn’t the first time I talked about ODRL this week. I had a conversation with Pieter Colpaert. I reached out to get some input on his current view of the landscape of civic organisations active around the EU data spaces. We also touched upon his current work at the University of Gent. His research interest is on ODRL currently, specifically on enactment. ODRL is a REL, a rights expression language. Describing rights is one thing, enacting them in practice, in technology, processes etc. is a different thing. Next to that, how do you demonstrate that you adhere to the conditions expressed and that you qualify for using the things described?

For the EU data space(s) this part sounds key to me, as none of the data involved is merely part of a single clear interaction like in the song example above. It’s part of a variety of flows in which actors likely don’t directly interact, where many different data elements come together. This includes flows through applications that tap into a data space for inputs and outputs but are otherwise outside of it. Such applications are also digital twins, federated systems of digital twins even, meaning a confluence of many different data and conditions across multiple domains (and thus data spaces). All this removes a piece of data lightyears from the neat situation where two actors share it between them in a clearly described transaction within a single-faceted use case.

Expressing the commons

It’s one thing to express restrictions or usage conditions. The DSSC in their webinar talked a lot about business models around use cases, and ODRL as a means for a data source to stay in control throughout a piece of data’s life cycle. Luckily they stopped using the phrase ‘data ownership’ as they realised it’s not meaningful (and confusing on top of it), and focused on control and maintaining having a say by an actor.
An open question for me is how you would express openness and the commons in ODRL. A shallow search surfaces some examples of trying to express Creative Commons or other licenses this way, but none recent.

Openness, can mean an absence of certain conditions, although there may be some (like adding the same absence of conditions to re-shared material or derivative works), which is not the same as setting explicit permissions. If I e.g. dedicate something to the public domain, an image for instance, then there are no permissions for me to grant, as I’ve removed myself from that role of being able to give permission. Yet, you still want to express it to ensure that it is clear for all that that is what happened, and especially that it remains that way.

Part of that question is about the overlap and distinction between rights expressed in ODRL and authorship rights. You can obviously have many conditions outside of copyright, and can have copyright elements that may be outside of what can be expressed in RELs. I wonder how for instance moral authorship rights (that an author in some (all) European jurisdictions cannot do away with) can be expressed after an author has transferred/sold the copyrights to something? Or maybe, expressing authorship rights / copyrights is not what RELs are primarily for, as it those are generic and RELs may be meant for expressing conditions around a specific asset in a specific transaction. There have been various attempts to map all kinds of licenses to RELs though, so I need to explore more.

This is relevant for the EU common data spaces as my government clients will be actors in them and bringing in both open data and closed and unsharable but re-usable data, and several different shades in between. A range of new obligations and possibilities w.r.t. data use for government are created in the EU data strategy laws and the data space is where those become actualised. Meaning it should be possible to express the corresponding usage conditions in ODRL.

ODRL gaps?

Are there gaps in the ODRL standard w.r.t. what it can cover? Or things that are hard to express in it?
I came across one paper ‘A critical reflection on ODRL’ (PDF Kebede, Sileno, Van Engers 2020), that I have yet to read, that describes some of those potential weaknesses, based on use cases in healthcare and logistics. Looking forward to digging out their specific critique.

Oh great, LinkedIn! Of course I want you to ‘suggest’ postings in my timeline concerning conspiracy delusions about the fires in Hawaii, a disfigured street cat ‘nevertheless’ feeding its young and thus commended for its nurturing instincts (is animal ableism a separate category in your data model?), an autoplaying video of a woman removing mobiles from her family’s hands at the dinner table in a very funny (hahaha!) way, and something about a leopard. Enshittification ftw! I unfollowed every one on my contact list two years ago just for you to have more space to play Facebook and TikTok all by yourself. And I am also very pleased you always make me set the timeline to ‘most recent’ and then put it back to ‘most relevant’ (I do wonder about LinkedIn’s definition of ‘relevant’) so I don’t miss any of your suggestions. I think I need to use a different way of going to LinkedIn to find the details of someone in my network than the default /feed LinkedIn steers you to. I’ll add the direct path to the network search page as bookmark. And continuously strengthen my personal notes-as-rolodex.

Such a great day for the Digital Services Act to come into effect for ‘VLOPS’ like LinkedIn!

I’ve been involved in open data for about 15 years. Back then we had a vibrant European wide network of activists and civic organisations around open data, partially triggered by the first PSI Directive that was the European legal fundament for our call for more open government data.

Since 2020 a much wider and fundamental legal framework than the PSI Directive ever was is taking shape, with the Data Governance Act, Data Act, AI Regulation, Open Data Directive, High Value Data implementing regulation as building blocks. Together they create the EU single market for data, adding data as fourth element to the list of freedom of movement for people, products and capital within the EU. This will all take shape as the European common dataspace(s), built from a range of sectoral dataspaces.

In the past years I’ve been actively involved in these developments, currently helping large government data holders in the Netherlands interpret the new obligations and above all new opportunities for public service that result from all this.

Now that the dataspaces are slowly taking shape, what I find missing from most discussions and events is the voice of civic organisations and activists. It’s mostly IT companies and research institutions that are involved. While for the Commission social impact (climate, health, energy and agricultural transitions e.g.) is a key element in why they seek to implement these new laws, for most parties involved in the dataspaces that is less of a consideration, and economic and technological factors are more important. Not even government data holders themselves are represented much in how the European data space will turn out. Even though everyone single one of us and every public entity by default is a part of this common market.

I would like to strengthen the voice of civil society and activists in this area, to together influence the shape these dataspaces are taking. So that they are of use and value to us too. To use the new (legal) tools to strengthen the commons, to increase our agency.

Most of the old European open data network however over time has dissolved, as we all got involved in national level practical projects and the European network as a source of sense of belonging and strengthening each others commitment became less important. And we’ve moved on a good number of years, so many new people have come on to the scene, unconnected to that history, with new perspectives and new capabilities.

So the question is: who is active on these topics, from a civil society perspective, as activists? Who should be involved? What are the organisations, the events, that are relevant regionally, nationally, EU wide? Can we connect those existing dots: to share experiencs, examples, join our voices, pool our efforts?

Currently I’m doing a first scan of who is involved in which EU country, what type of events are visible, organisations that are active etc. Starting from my old network of a decade ago. I will share lists of what I find at Our Common Data Space.

Let me know if you count yourself as part of this European network. Let me know the relevant efforts you are aware of. Let me know which events you think bring together people likely to want to be involved.

I look forward to finding out about you!


Open Government Data Camp in Warsaw 2011. An example of the vibrancy of the European open data network, I called it the community’s ‘family christmas party’, at the time. Above the schedule of sessions created collectively by the participants, with many local initiatives and examples shared with the EU wide network. Below one of those sessions, on local policy making and open data.