The period of the European Commission that has just finished delivered an ambitious and coherent legal framework for both the single digital market and the single market for data, based on the digital and data strategies the EU formulated. Those laws, such as the Data Governance Act, Data Act, High Value Data implementing regulation and the AI Act are all finished and in force (if not always fully in application). This means efforts are now switching to implementation. The detailed programme of the next European Commission, now being formed, isn’t known yet. Big new legislation efforts in this area are however not expected.

This summer Ursula von der Leyen, the incoming chairperson of the Commission has presented the political guidelines. In it you can find what the EC will pay attention to in the coming years in the field of data and digitisation.

Data and digital are geopolitical in nature
The guidelines underline the geopolitical nature of both digitisation and data. The EU will therefore seek to modernise and strengthen international institutions and processes. It is noted that outside influence in regular policy domains has become a more common instrument in geopolitics. Data and transparency are likely tools to keep a level headed view of what’s going on for real. Data also is crucial in driving several technology developments, such as in AI and digital twins.

European Climate Adaptation Plan Built on Data
The EU will increase their focus on mapping risks and preparedness w.r.t. natural disasters and their impact on infrastructure, energy, food security, water, land use both in cities and in rural areas, as well as early warning systems. This is sure to contain a large data component, a role for the Green Deal Data Space (for which the implementation phase will start soon, now the preparatory phase has been completed) and the climate change digital twin of the earth (DestinE, for which the first phase has been delivered). Climate and environment are the areas where already before the EC emphasised the close connection between digitisation and data and the ability to achieve European climate and environmental goals.

AI trained with data
Garbage in, garbage out: access to enough high quality data is crucial to all AI development, en therefore data will play a role in all AI plans from the Commission.

An Apply AI Strategy was announced, aimed at sectoral AI applications (in industry, public services or healthcare e.g.). The direction here is towards smaller models, squarely aimed at specific questions or tasks, in the context of specific sectors. This requires the availability and responsible access to data in these sectors, in which the European common data spaces will play a key role.

In the first half of 2025 an AI Factories Initiative will be launched. This is meant to provide SME’s and newly starting companies with access to the computing power of the European supercomputing network, for AI applications.

There will also be an European AI Research Council, dubbed a ‘CERN for AI’, in which knowledge, resources, money, people, and data.

Focus on implementing data regulations
The make the above possible a coherent and consistent implementation of the existing data rules from the previous Commission period is crucial. Useful explanations and translations of the rules for companies and public sector bodies is needed, to allow for seamless data usage across Europe and at scale. This within the rules for data protection and information security that equally apply. The directorate within the Commission that is responsible for data, DG Connect, sees their task for the coming years a mainly being ensuring the consistent implementation of the new laws from the last few years. The implementation of the GDPR until 2018 is seen as an example where such consistency was lacking.

European Data Union
The political guidelines announce a strategy for a European Data Union. Aimed at better and more detailed explanations of the existing regulations, and above all the actual availability and usage of data, it reinforces the measure of success the data strategy already used: the socio-economic impact of data usage. This means involving SME’s at a much larger volume, and in this context also the difference between such SME’s and large data users outside of the EU is specifically mentioned. This Data Union is a new label and a new emphasis on what the European Data Strategy already seeks to do, the creation of a single market for data, meaning a freedom of movement for people, goods, capital and data. That Data Strategy forms a consistent whole with the digital strategy of which the Digital Markets Act, Digital Services Act and AI Act are part. That coherence will be maintained.

My work: ensuring that implementation and normalisation is informed by good practice
In 2020 I helped write what is now the High Value Data implementing regulation, and in the past years my role has been tracking and explaining the many EU digital and data regulations initiatives on behalf of the main Dutch government holders of geo-data. Not just in terms of new requirements, but with an accent on the new instruments and affordances those rules create. The new instruments allow new agency of different stakeholder groups, and new opportunities for societal impact come from them.
The phase shift from regulation to implementation provides an opportunity to influence how the new rules get applied in practice, for instance in the common European data spaces. Which compelling cases of data use can have an impact on implementation process, can help set the tone or even have a normalisation effect? I’m certain practice can play a role like this, but it takes bringing those practical experiences to a wider European network. Good examples help keep the actual goal of socio-economic impact in sight, and means you can argue from tangible experience in your interactions.

My work for Geonovum the coming time is aimed at this phase shift. I already helped them take on a role in the coming implementation of the Green Deal Data Space, and I’m now exploring other related efforts. I’m also assisting the Ministry for the Interior in formulating guidance for public sector bodies and data users on how to deal with the chapter of the Data Governance Act that allows for the use (but not the sharing) of protected data held by the public sector. Personally I’m also seeking ways to increase the involvement of civil society organisations in this area.

Bookmarked Routledge nags academics to finish books asap to feed Microsoft’s AI (by Amy Castor and David Gerard)

Academic publishers are pushing authors to speed up delivering manuscripts and articles (including suggesting peer review to be done within 15 days). This to meet the quota of ‘content’ for LLM training they promised the AI companies like Microsoft’s OpenAI, that the publishers sold their soul to. Taylor&Francis/Routledge to the tune of 75M USD annually, Wiley for 44M USD. Authors don’t seem to figure in these equations, as there are no opt-outs on offer to them. No cuts either, obviously.

Must. Feed. The. Machine. Apparently. Any randomly arranged bunch of words or phrases will do. Apparently.
Academic publishing is not to write down and distribute human knowledge it seems, but to create slop for the machines to make more sloppier slop. Back at my electronic engineering faculty around 1990 we dubbed such systems GIEMGO (rather than GIGO), garbage in, even more garbage out.

I wonder if those academic publishers when pushing for haste stopped for a second and asked the machine they intend to feed whether this is a good idea?
I did. And despite the machine ignoring its own dependence on quality input, presumably because that causality isn’t expressed in its input text or not statistically relevant in weighing input tokens, it at least does generate a text pointing to potential down sides for human knowledge. How nice of it.

Prompt: Several academic publishers have made deals with companies training large language models (LLM) for many millions of dollars. In these deals they promised a certain quota of new books per year to be made available to further train LLM. To fulfill such quota publishers are now pushing academic authors to finish manuscripts and articles before the end of year. Including by foregoing proper peer review. Do you think this is a good idea? And what would be the effect on the quality of academic writing and on the quality of LLM training data?
Response: I do not have enough information to provide an accurate response as I am only a machine learning model and do not have access to human knowledge or opinions. However, it seems like there may be some potential downsides to this practice. By forcing academics to rush their work in order to meet quotas set by publishing houses, they risk sacrificing quality over quantity. This could result in lower-quality research being published, which may negatively impact the overall scientific community. Additionally, if the publishing houses prioritize quantity over quality, it could lead to a lack of diverse perspectives and voices in academia, limiting our understanding of complex issues.

Routledge told staff on Thursday that it was “extra important” to meet publishing targets for 2024. Why? Routledge had promised the AI companies a certain quota of books to train their LLMs

Amy Castor and David Gerard

Bookmarked Commission opens non-compliance investigations against Alphabet, Apple and Meta under the Digital Markets Act (by European Commission)

With the large horizontal legal framework for the single digital market and the single market for data mostly in force and applicable, the EC is initiating first actions. This announcement focuses on app store aspects, on steering (third parties being able to provide users with other paths of paying for services than e.g. Apple’s app store), on (un-)installing any app and freedom to change settings, as well as providers preferencing own services above those of others. Five investigations for suspected non-compliance involving Google (Alphabet), Apple, and Meta (Facebook) have been announced. Amazon and Microsoft are also being investigated in order to clarify aspects that may lead to suspicions of non-compliance.

The investigation into Facebook is about their ‘pay or consent’ model, which is Facebook’s latest attempt to circumvent their GDPR obligations that consent should be freely given. It was clear that their move, even if it allows them to steer clear of GDPR (which is still very uncertain), it would create issues under the Digital Markets Act (DMA).

In the same press release the EC announces that Facebook Messenger is getting a 6 month extension of the period in which to comply with interoperability demands.

The Commission suspects that the measures put in place by these gatekeepers fall short of effective compliance of their obligations under the DMA. … The Commission has also adopted five retention orders addressed to Alphabet, Amazon, Apple, Meta, and Microsoft, asking them to retain documents which might be used to assess their compliance with the DMA obligations, so as to preserve available evidence and ensure effective enforcement.

European Commission

Bookmarked Internet of Things and Objects of Sociality (by Ton Zijlstra, 2008)

Fifteen years ago today I blogged this brainstorming exercise about how internet-connectivity for objects might make for different and new objects of sociality. A way to interact with our environment differently. Not a whole lot of that has happened, let alone become common. What has happened is IoT being locked up in device and mobile app pairings. Our Hue lights are tied to the Hue app, and if I’d let it collect e.g. behavioural data it would go to Philips first, not to me. A Ring doorbell (now disabled), our Sonos speakers are the same Those rigid pairings are a far cry from me seamlessly interacting with my environment. One exception is our Meet Je Stad sensor in the garden, as it runs on LoRaWan and the local citizen science community has the same access as I do to the data (and I run a LoRa gateway myself, adding another control point for me).

Incoming EU legislation may help to get more agency on this front. First and foremost, the Data Act when it is finished will make it mandatory that I can access the data I generate with my use of devices like those Hue lights and Sonos speakers and any others you and I may have in use (the data from the invertor on your solar panels for instance). And allow third parties to use that data in real time. A second relevant law I think is the Cyber Resilience Act, which regulates the cybersecurity of any ‘product with digital elements’ on the EU market, and makes it mandatory to provide additional (technical) documentation around that topic.

The internet of things, increases the role of physical objects as social objects enormously, because it adds heaps of context that can serve relationships. Physical objects always have been social objects, but only in their immediate physical context. … Making physical objects internet-aware creates a slew of possible new uses for it as social objects. And if you [yourself] add more sensors or actuators to a product (object hacks so to speak), the list grows accordingly.

Ton Zijlstra, 2008

ODRL, Open Digital Rights Language popped up twice this week for me and I don’t think I’ve been aware of it before. Some notes for me to start exploring.

Rights Expression Languages

Rights Expression Languages, RELs, provide a machine readable way to convey or transfer usage conditions, rights, restraints, granularly w.r.t. both actions and actors. This can then be added as metadata to something. ODRL is a rights expression language, and seems to be a de facto standard.

ODRL is a W3C recommendation since 2018, and thus part of the open web standards. ODRL has its roots in the ’00s and Digital Rights Management (DRM): the abhorred protections media companies added to music and movies, and now e-books, in ways that restrains what people can do with media they bought to well below the level of what was possible before and commonly thought part of having bought something.

ODRL can be expressed in JSON or RDF and XML. A basic example from Wikipedia looks like this:


{
"@context": "http://www.w3.org/ns/odrl.jsonld",
"uid": "http://example.com/policy:001",
"permission": [{
"target": "http://example.com/mysong.mp3",
"assignee": "John Doe",
"action": "play"
}]
}

In this JSON example a policy describes that example.com grants John permission to play mysong.

ODRL in the EU Data Space

In the shaping of the EU common market for data, aka the European common data space, it is important to be able to trace provenance and usage conditions for not just data sets, but singular pieces of data, as it flows through use cases, through applications and their output back into the data space.
This week I participated in a webinar by the EU Data Space Support Center (DSSC) about their first blueprint of data space building blocks, and for federation of such data spaces.

They propose ODRL as the standard to describe usage conditions throughout data spaces.

The question of enactment

It wasn’t the first time I talked about ODRL this week. I had a conversation with Pieter Colpaert. I reached out to get some input on his current view of the landscape of civic organisations active around the EU data spaces. We also touched upon his current work at the University of Gent. His research interest is on ODRL currently, specifically on enactment. ODRL is a REL, a rights expression language. Describing rights is one thing, enacting them in practice, in technology, processes etc. is a different thing. Next to that, how do you demonstrate that you adhere to the conditions expressed and that you qualify for using the things described?

For the EU data space(s) this part sounds key to me, as none of the data involved is merely part of a single clear interaction like in the song example above. It’s part of a variety of flows in which actors likely don’t directly interact, where many different data elements come together. This includes flows through applications that tap into a data space for inputs and outputs but are otherwise outside of it. Such applications are also digital twins, federated systems of digital twins even, meaning a confluence of many different data and conditions across multiple domains (and thus data spaces). All this removes a piece of data lightyears from the neat situation where two actors share it between them in a clearly described transaction within a single-faceted use case.

Expressing the commons

It’s one thing to express restrictions or usage conditions. The DSSC in their webinar talked a lot about business models around use cases, and ODRL as a means for a data source to stay in control throughout a piece of data’s life cycle. Luckily they stopped using the phrase ‘data ownership’ as they realised it’s not meaningful (and confusing on top of it), and focused on control and maintaining having a say by an actor.
An open question for me is how you would express openness and the commons in ODRL. A shallow search surfaces some examples of trying to express Creative Commons or other licenses this way, but none recent.

Openness, can mean an absence of certain conditions, although there may be some (like adding the same absence of conditions to re-shared material or derivative works), which is not the same as setting explicit permissions. If I e.g. dedicate something to the public domain, an image for instance, then there are no permissions for me to grant, as I’ve removed myself from that role of being able to give permission. Yet, you still want to express it to ensure that it is clear for all that that is what happened, and especially that it remains that way.

Part of that question is about the overlap and distinction between rights expressed in ODRL and authorship rights. You can obviously have many conditions outside of copyright, and can have copyright elements that may be outside of what can be expressed in RELs. I wonder how for instance moral authorship rights (that an author in some (all) European jurisdictions cannot do away with) can be expressed after an author has transferred/sold the copyrights to something? Or maybe, expressing authorship rights / copyrights is not what RELs are primarily for, as it those are generic and RELs may be meant for expressing conditions around a specific asset in a specific transaction. There have been various attempts to map all kinds of licenses to RELs though, so I need to explore more.

This is relevant for the EU common data spaces as my government clients will be actors in them and bringing in both open data and closed and unsharable but re-usable data, and several different shades in between. A range of new obligations and possibilities w.r.t. data use for government are created in the EU data strategy laws and the data space is where those become actualised. Meaning it should be possible to express the corresponding usage conditions in ODRL.

ODRL gaps?

Are there gaps in the ODRL standard w.r.t. what it can cover? Or things that are hard to express in it?
I came across one paper ‘A critical reflection on ODRL’ (PDF Kebede, Sileno, Van Engers 2020), that I have yet to read, that describes some of those potential weaknesses, based on use cases in healthcare and logistics. Looking forward to digging out their specific critique.

I’ve been involved in open data for about 15 years. Back then we had a vibrant European wide network of activists and civic organisations around open data, partially triggered by the first PSI Directive that was the European legal fundament for our call for more open government data.

Since 2020 a much wider and fundamental legal framework than the PSI Directive ever was is taking shape, with the Data Governance Act, Data Act, AI Regulation, Open Data Directive, High Value Data implementing regulation as building blocks. Together they create the EU single market for data, adding data as fourth element to the list of freedom of movement for people, products and capital within the EU. This will all take shape as the European common dataspace(s), built from a range of sectoral dataspaces.

In the past years I’ve been actively involved in these developments, currently helping large government data holders in the Netherlands interpret the new obligations and above all new opportunities for public service that result from all this.

Now that the dataspaces are slowly taking shape, what I find missing from most discussions and events is the voice of civic organisations and activists. It’s mostly IT companies and research institutions that are involved. While for the Commission social impact (climate, health, energy and agricultural transitions e.g.) is a key element in why they seek to implement these new laws, for most parties involved in the dataspaces that is less of a consideration, and economic and technological factors are more important. Not even government data holders themselves are represented much in how the European data space will turn out. Even though everyone single one of us and every public entity by default is a part of this common market.

I would like to strengthen the voice of civil society and activists in this area, to together influence the shape these dataspaces are taking. So that they are of use and value to us too. To use the new (legal) tools to strengthen the commons, to increase our agency.

Most of the old European open data network however over time has dissolved, as we all got involved in national level practical projects and the European network as a source of sense of belonging and strengthening each others commitment became less important. And we’ve moved on a good number of years, so many new people have come on to the scene, unconnected to that history, with new perspectives and new capabilities.

So the question is: who is active on these topics, from a civil society perspective, as activists? Who should be involved? What are the organisations, the events, that are relevant regionally, nationally, EU wide? Can we connect those existing dots: to share experiencs, examples, join our voices, pool our efforts?

Currently I’m doing a first scan of who is involved in which EU country, what type of events are visible, organisations that are active etc. Starting from my old network of a decade ago. I will share lists of what I find at Our Common Data Space.

Let me know if you count yourself as part of this European network. Let me know the relevant efforts you are aware of. Let me know which events you think bring together people likely to want to be involved.

I look forward to finding out about you!


Open Government Data Camp in Warsaw 2011. An example of the vibrancy of the European open data network, I called it the community’s ‘family christmas party’, at the time. Above the schedule of sessions created collectively by the participants, with many local initiatives and examples shared with the EU wide network. Below one of those sessions, on local policy making and open data.