ODRL, Open Digital Rights Language popped up twice this week for me and I don’t think I’ve been aware of it before. Some notes for me to start exploring.

Rights Expression Languages

Rights Expression Languages, RELs, provide a machine readable way to convey or transfer usage conditions, rights, restraints, granularly w.r.t. both actions and actors. This can then be added as metadata to something. ODRL is a rights expression language, and seems to be a de facto standard.

ODRL is a W3C recommendation since 2018, and thus part of the open web standards. ODRL has its roots in the ’00s and Digital Rights Management (DRM): the abhorred protections media companies added to music and movies, and now e-books, in ways that restrains what people can do with media they bought to well below the level of what was possible before and commonly thought part of having bought something.

ODRL can be expressed in JSON or RDF and XML. A basic example from Wikipedia looks like this:


{
"@context": "http://www.w3.org/ns/odrl.jsonld",
"uid": "http://example.com/policy:001",
"permission": [{
"target": "http://example.com/mysong.mp3",
"assignee": "John Doe",
"action": "play"
}]
}

In this JSON example a policy describes that example.com grants John permission to play mysong.

ODRL in the EU Data Space

In the shaping of the EU common market for data, aka the European common data space, it is important to be able to trace provenance and usage conditions for not just data sets, but singular pieces of data, as it flows through use cases, through applications and their output back into the data space.
This week I participated in a webinar by the EU Data Space Support Center (DSSC) about their first blueprint of data space building blocks, and for federation of such data spaces.

They propose ODRL as the standard to describe usage conditions throughout data spaces.

The question of enactment

It wasn’t the first time I talked about ODRL this week. I had a conversation with Pieter Colpaert. I reached out to get some input on his current view of the landscape of civic organisations active around the EU data spaces. We also touched upon his current work at the University of Gent. His research interest is on ODRL currently, specifically on enactment. ODRL is a REL, a rights expression language. Describing rights is one thing, enacting them in practice, in technology, processes etc. is a different thing. Next to that, how do you demonstrate that you adhere to the conditions expressed and that you qualify for using the things described?

For the EU data space(s) this part sounds key to me, as none of the data involved is merely part of a single clear interaction like in the song example above. It’s part of a variety of flows in which actors likely don’t directly interact, where many different data elements come together. This includes flows through applications that tap into a data space for inputs and outputs but are otherwise outside of it. Such applications are also digital twins, federated systems of digital twins even, meaning a confluence of many different data and conditions across multiple domains (and thus data spaces). All this removes a piece of data lightyears from the neat situation where two actors share it between them in a clearly described transaction within a single-faceted use case.

Expressing the commons

It’s one thing to express restrictions or usage conditions. The DSSC in their webinar talked a lot about business models around use cases, and ODRL as a means for a data source to stay in control throughout a piece of data’s life cycle. Luckily they stopped using the phrase ‘data ownership’ as they realised it’s not meaningful (and confusing on top of it), and focused on control and maintaining having a say by an actor.
An open question for me is how you would express openness and the commons in ODRL. A shallow search surfaces some examples of trying to express Creative Commons or other licenses this way, but none recent.

Openness, can mean an absence of certain conditions, although there may be some (like adding the same absence of conditions to re-shared material or derivative works), which is not the same as setting explicit permissions. If I e.g. dedicate something to the public domain, an image for instance, then there are no permissions for me to grant, as I’ve removed myself from that role of being able to give permission. Yet, you still want to express it to ensure that it is clear for all that that is what happened, and especially that it remains that way.

Part of that question is about the overlap and distinction between rights expressed in ODRL and authorship rights. You can obviously have many conditions outside of copyright, and can have copyright elements that may be outside of what can be expressed in RELs. I wonder how for instance moral authorship rights (that an author in some (all) European jurisdictions cannot do away with) can be expressed after an author has transferred/sold the copyrights to something? Or maybe, expressing authorship rights / copyrights is not what RELs are primarily for, as it those are generic and RELs may be meant for expressing conditions around a specific asset in a specific transaction. There have been various attempts to map all kinds of licenses to RELs though, so I need to explore more.

This is relevant for the EU common data spaces as my government clients will be actors in them and bringing in both open data and closed and unsharable but re-usable data, and several different shades in between. A range of new obligations and possibilities w.r.t. data use for government are created in the EU data strategy laws and the data space is where those become actualised. Meaning it should be possible to express the corresponding usage conditions in ODRL.

ODRL gaps?

Are there gaps in the ODRL standard w.r.t. what it can cover? Or things that are hard to express in it?
I came across one paper ‘A critical reflection on ODRL’ (PDF Kebede, Sileno, Van Engers 2020), that I have yet to read, that describes some of those potential weaknesses, based on use cases in healthcare and logistics. Looking forward to digging out their specific critique.

Some good movement on EU data legislation this month! I’ve been keeping track of EU data and digital legislation in the past three years. In 2020 I helped determine the content of what has become the High Value Data implementing regulation (my focus was on earth observation, environmental and meteorological data), and since then for the Dutch government I’ve been involved in translating the incoming legislation to implementing steps and opportunities for Dutch government geo-data holders.

AI Act

The AI Act stipulates what types of algorithmic applications are allowed on the European market under which conditions. A few things are banned, the rest of the provisions are tied to a risk assessment. Higher risk applications carry heavier responsibilities and obligations for market entry. It’s a CE marking for these applications, with responsibilities for producers, distributors, users, and users of output of usage.
The Commission proposed the AI Act in april 2021, the Council responded with its version in December 2022.

Two weeks ago the European Parliament approved in plenary its version of the AI Act.
In my reading the EP both strengthens and weakens the original proposal. It strengthens it by restricting certain types of uses further than the original proposal, and adds foundational models to its scope.
It also adds a definition of what is considered AI in the context of this law. This in itself is logical as, originally the proposal did not try to define that other than listing technologies in an annex that were deemed in scope. However while adding that definition, they removed the annex. That, I think weakens the AI Act and will make future enforcement much slower and harder. Because now everything will depend on the interpretation of the definition, meaning it will be a key point of contention before the courts (‘my product is out of scope!’). Whereas by having both the definition and the annex, the legislative specifically states which things it considers in scope of the definition at the very least. As the Annex would be periodically updated, it would also remain future proof.

With the stated positions of the Council and Parliament the trilogue can now start to negotiate the final text which then needs to be approved by both Council and Parliament again.

All in all this looks like the AI Act will be finished and in force before the end of year, and will be applied by 2025.

Data Act

The Data Act is one of the building blocks of the EU Data Strategy (the others being the Data Governance Act, applied from September, the Open Data Directive, in force since mid 2021, and the implementing regulation High Value Data which the public sector must comply with by spring 2024). The Data Act contains several interesting proposals. One is requiring connected devices to not only allow users access to the (real time) data they create (think thermostats, solar panel transformers, sensors etc.), as well as allowing users to share that data with third parties. You can think of this as ‘PSD2-for-everything’. PSD2 says that banks must enable you to share your banking data with third parties (meaning you can manage your account at Bank A with the mobile app of Bank B, can connect your book keeping software etc.). The Data Act extends this to ‘everything’ that is connected. Another interesting component is that it allows public sector bodies in case of emergencies (floods e.g.) to require certain data from private sector parties, across borders. The Dutch government heavily opposed this so I am interested in seeing what the final formulation of this part is in the Act. Other provisions make it easier for people to switch platform services (e.g. cloud providers), and create space for the European Commission to set, let develop, adopt or mandate certain data standards across sectors. That last element is of relevance to the shaping of the single market for data, aka the European common data space(s), and here too I look forward to reading the final formulation.

With the Council of the European Union and the European Parliament having reached a common text, what rests is final approval by both bodies. This should be concluded under the Spanish presidency that starts this weekend, and the Data Act will then enter into force sometime this fall, with a grace period of some 18 months or so until sometime in 2025.

There’s more this month: ITS Directive

The Intelligent Transport Systems Directive (ITS Directive) was originally created in 2010, to ensure data availability about traffic conditions etc. for e.g. (multi-modal) planning purposes. In the Netherlands for instance real-time information about traffic intensity is available in this context. The Commmission proposed to revise the ITS Directive late 2021 to take into account technological developments and things like automated mobility and on-demand mobility systems. This month the Council and European Parliament agreed a common text on the new ITS Directive. I look forward to close reading the final text, also on its connections to the Data Act above, and its potential in the context of the European mobility data space. Between the Data Act and the ITS Directive I’m also interested in the position of in-car data. Our cars increasinly are mobile sensor platforms, to which the owner/driver has little to no access, which should change imo.

This looks like a very welcome development: The European Commission (EC) is to ask for status updates of all international GDPR cases with all the Member State Data Protection Authorities (DPAs) every other month. This in response to a formal complaint by the Irish Council for Civil Liberties starting in 2021 about the footdragging of the Irish DPA in their investigations of BigTech cases (which mostly have their EU activities domiciled in Ireland).

The GDPR, the EU’s data protection regulation, has been in force since mid 2018. Since then many cases have been progressing extremely slowly. To a large extent because it seems that Ireland’s DPA has been the subject of regulatory capture by BigTech, up to the point where it is defying direct instructions by the EU data protection board and taking an outside position relative to all other European DPA’s.

With bi-monthly status updates of ongoing specific cases from now being requested by the EC of each Member State, this is a step up from the multi-year self-reporting by MS that usually is done to determine potential infringements. This should have an impact on the consistency with which the GDPR gets applied, and above all on ensuring cases are being resolved at adequate speed. The glacial pace of bigger cases risks eroding confidence in the GDPR especially if smaller cases do get dealt with (the local butcher getting fined for sloppy marketing, while Facebook makes billions of person-targeted ads without people’s consent).

So kudos to ICCL for filing the complaint and working with the EU Ombudsman on this, and to the EC for taking it as an opportunity to much more closely monitor GDPR enforcement.

Bookmarked Target_Is_New, Issue 212 by Iskander Smit

Iskander asks what about users, next to makers, when it comes to responsible AI? For a slightly different type of user at least, such responsibilities are being formulated in the proposed EU AI Regulation, as well as the connected AI Liability Directive. There not just the producers and distributors of AI containing services or products have responsibilities, but also those who deploy them in practice, or those who use its outputs. He’s right that most discussions focus on within the established system of making, training and deploying AI, and we should also look outside the system. Where in this case the people using AI, or using their output reside. That’s why I like the EU’s legislative approach, as it doesn’t aim to regulate the system as seen from within it, but focuses on access conditions for such products to the European market, and the impact it has within society. Of course, these proposals are still under negotiation, and it’s wait and see what will remain at the end of that process.

As I wrote down as thoughts while listening to Dasha Simons; we are all convinced of the importance of explainability, transparency, and even interpretability, all focused on making the system responsible and, with them, the makers of the system. But what about the responsibility of the users? Are they also part of the equation, should they be responsible too? As the AI (or what term we use) is continuous learning and shaping, the prompts we give are more than a means to retrieve the best results; it is also part of the upbringing of the AI. We are, as users, also responsible for good AI as the producers are.

Iskander Smit

Bookmarked AI Liability Directive (PDF) (by the European Commission)

This should be interesting to compare with the proposed AI Regulation. The AI Regulation specifies under which conditions an AI product or service, or the output of one will or won’t be admitted on the EU market (literally a CE mark for AI), based on a risk assessment that includes risks to civic rights, equipment safety and critical infrastructure. That text is still very much under negotiation, but is a building block of the combined EU digital and data strategies. The EU in parallel is modernising their product liability rules, and now include damage caused by AI related products and services within that scope. Both are listed on the EC’s page concerning AI so some integration is to be expected. Is the proposal already anticipating parts of the AI Regulation, or does it try to replicate some of it within this Directive? Is it fully aligned with the proposed AI Regulation or are there surprising contrasts? As this proposal is a Directive (which needs to be translated into national law in each Member State), and the AI Regulation becomes law without such national translation that too is a dynamic of interest, in the sense that this Directive builds on existing national legal instruments. There was a consultation on this Directive late 2021. Which DG created this proposal, DG Just?

The problems this proposal aims to address, in particular legal uncertainty and legal fragmentation, hinder the development of the internal market and thus amount to significant obstacles to cross-border trade in AI-enabled products and services.

The proposal addresses obstacles stemming from the fact that businesses that want to produce, disseminate and operate AI-enabled products and services across borders are uncertain whether and how existing liability regimes apply to damage caused by AI. … In a cross-border context, the law applicable to a non-contractual liability arising out of a tort or delict is by default the law of the country in which the damage occurs. For these businesses, it is essential to know the relevant liability risks and to be able to insure themselves against them.

In addition, there are concrete signs that a number of Member States are considering unilateral legislative measures to address the specific challenges posed by AI with respect to liability. … Given the large divergence between Member States’ existing civil liability rules, it is likely that any national AI-specific measure on liability would follow existing different national approaches and therefore increase fragmentation. Therefore, adaptations of liability rules taken on a purely national basis would increase the barriers to the rollout of AI-enabled products and services across the internal market and contribute further to fragmentation.

European Commission

This week the draft implementation act (PDF) and annex listing the first batch of European High Value Data sets (PDF) has finally been published. In the first half of 2020 I was involved in preparatory research to advise on what data, spread across six predetermined themes, should be put on this mandatory list. It’s the first time open data policy makes the publication of certain data mandatory through an API. Until now European open data policy built upon the freedom of information measures of each EU Member State (MS), and added mandatory conditions to what MS published voluntarily, and to how to respond to public data re-use requests. This new law arranges for the pro-active publication of certain data sets.

In the 2020 research I was responsible for the sections about earth observation, environmental, and meteorological data. We submitted our final report in September 2020, and since then there had been total silence w.r.t. the progress in negotiating the list with the MS, and putting together the implementation act. I knew that at least the earth observation and environmental data would largely be included the way I suggested, when last summer I got a sneak preview of the adaptation of the INSPIRE portal where such data is made available.

The Implementation Act

In the Open Data Directive there’s a provision that the European Commission can, through a separate implementation act, set mandatory open data requirements for data belonging to themes listed in the Directive’s Annex. At launch in 2019, 6 such themes were listed: Geo-information, statistics, mobility, company information, earth observation / environment, and meteorology.
The list of themes can also be amended, through another separate implementation act, and a process to determine the second set of themes is currently underway.

The draft implementation law (PDF) states that government-held datasets mentioned in its Annex must be published through APIs, under an open license such as Creative Commons Zero, By Attribution or equivalent / less restrictive. Governments must publish the terms of use for such APIs and these terms may not be used to discourage re-use. APIs must also be fully publicly documented, and a point of contact must be provided.

MS can temporarily exempt some of the high value datasets, a decision that must be made public, but limited to two years after entry into force of this implementation act. Additional usage restrictions are allowed for personal data within the data sets concerned, but only to the extent needed to protect personal data of individuals (so not as an excuse to disallow re-use and access to the data as a whole).

MS must report on their implementation actions every two years, in which they need to list the actual data sets opened, the links to licenses, API and documentation, and exemptions still in place. The implementation is immediately binding for all MS (no need to first transpose into national law to be enforcable), will apply 20 days after publication in the EU Journal, and MS have 6 months to comply.

The Data Sets Per Theme

In this first batch of mandatory open data, 6 themes are covered (PDF). Some brief remarks on all of them.

Mobility

This is, contrary to what you’d expect, the smallest theme of the six covered. Because everything that is already covered in the Intelligent Transport (ITS) Directive is out of scope, which is most of everything concerning land based mobility. What remains for the High Value Data list is data on transport networks contained in the INSPIRE Annex I theme Transport Networks, and static and dynamic data about inland waterways, as well as the electronic navigational charts (ENC) for inland waterways. This is much in line with the 2020 study report. There was some concern with national hydrographical services about ENCs for seas being included (making it harder to force sea going vessels to use the latest version), but my reassurances that it would be unlikely held true.

Geospatial data

Geospatial data is I would say the ‘original’ high value government data, and has been for centuries. The data sets from the four INSPIRE Annex I themes Administrative Units, Geographical Names, Addresses, Buildings and Cadastral Parcels are within scope. Additionally reference parcels and agricultural parcels as described in the 1306/2013 and 1307/2013 Regulations on the Common Agricultural Policy (CAP) are on the list.

Earth Observation and Environment

This was a theme I was responsible for in the 2020 study. It is an extremely broad category, covering a very wide spectrum of types of data. It was basically impossible to choose something from this list, not in the least because re-use value usually comes from combinations of data, not from any single source used. Therefore my proposed solution was to not choose, and advise to treat it as a coherent whole needed in addressing the EU goals concerning environment/nature, climate adaptation, and pollution. The High Value Data list adopts this approach and puts 19 INSPIRE themes within scope. These are:

  • Annex I: Hydrography, and Protected Sites
  • Annex II in full: Elevation, Geology, Land Cover, and Ortho-imagery
  • Annex III: Area management, Bio-geographical regions, Energy resources, Environmental monitoring facilities, Habitats and biotopes, Land use, Mineral resources, Natural risk zones, Oceanographic geographical features, Production and industrial facilities, Sea regions, Soil, and Species distribution

Additionally all environmental information as covered by the 2003/4/EC Directive on public access to such information is added to the list, and all data originating in the context of a wide range of EU Regulations and Directives on air, climate, emissions, nature preservation and biodiversity, noise, waste and water. I miss soil in this environmental list, but perhaps the Annex III INSPIRE theme is seen as sufficiently covering it. I still need to follow up on the precise formulations w.r.t. data in 31 additionally referenced regulations and directives.

What to me is a surprising phrasing is that earth observation is defined here including satellite based data. Not surprising in terms of earth observation itself, but because satellite data was specifically excluded from the scope of our 2020 study. First because the EU level satellite data is already open. Second because this list deals with data from MS, and not many MS have their own satellite data. When they do it is usually the result of public private collaborative investment, and such private investment may dry up if there are no longer temporary exclusive access arrangements possible, which would have resulted in considerable political objections. Perhaps adding space based data collection is currently being well enough watered down by defining the INSPIRE themes as its scope, while at the same time future proofing the definition for when satellite data does become part of INSPIRE themes.

Together these first three, mobility, geospatial, and EO/environment, place a full 24 out of 34 INSPIRE themes on the list for mandatory open data. This basically amounts to adding an open data requirement to INSPIRE. It places MS’ INSPIRE compliance very much in the focus of attention, which now often is limited, and further positions INSPIRE as a key building block in the coming Green Deal dataspace. It will be of high interest to see what the coming new version of the INSPIRE directive, currently under review, makes of all that.

Statistics

This topic is more widely covered in the High Value Data list, than it was in the 2020 study, both in the types of statistics included, and in the demands made of those types of statistics. Still there are lots of statistics that MS hold, that aren’t included here (while some MS do publish most of their statistics already btw): the selection is based on European reporting obligations that follow from a list of various European laws.
Topics for which statistics must be published as open data in a specified way:

  • Industrial production
  • Industrial producer price index, by activity
  • Volume of sales by activity
  • EU international trade in goods
  • Tourism flows in Europe
  • Harmonised consumer prices indices
  • National accounts: GDP, key indicators on corporations and households
  • Government expenditure and revenue, government gross debt
  • Population, fertility, mortality
  • Current healthcare expenditure
  • Poverty
  • Inequality
  • Employment, unemployment, potential labour force

Data for these reporting obligations should be available from the moment the law creating them has been in force. That means for instance that healthcare expenditure should be available from at least 2008, whereas employment statistics must be available from at least 2019, because of the different years in which these laws were enacted.

Company information

Company information from the start has been the most controversial theme of the six covered by this implementation act. I assume this theme has also been the prime political reason for the long delay in the proposal being published. In my perception because this is the only data set that actually might end up challenging the status quo in society (as it involves ownership and power structures, and touches tax evasion). In the 2020 study four aspects were considered, the basic company information, company documents and accounts, ownership information, and insolvency status. Two ended up in the draft law: basic company information and company documents. Opening ownership information, not even the ultimate beneficial ownership (UBO) information, from the start drew vehement objections (including from the Dutch government). Many stakeholders (including the NGO I chair) are disappointed with the current outcome. (Here’s an old blogpost where I explain UBO, and here’s SF writer Brin on what transparent UBO might mean to our societies.) The data that will become open data still may be 2 years in the future: the Open Data Directive allows a 2 year exemption, and this is the data where that exemption will be used I think.
That said, mandatory open company data and documents, even with the delay through exemptions, is already a step forward that puts an end to literally decades of court cases, obstruction, and lobbying for more openness. The very first PSI Directive in 2003 was already an expression of a broad demand for this data, now 20 years on it finally becomes mandatory across the EU. Some people I know have been after this for their entire professional careers and already retired. It’s easy to loose sight of that win when we only focus on not having (ultimate) ownership data included.

Meteorological data

This is the other theme I was responsible for in the 2020 study. Like with company information this is an area where the discussion about making it available for re-use is decades old and precedes digitisation becoming ubiquitous. When I started my open data work in 2008, most of the existing documentation and argumentation for the value of and need for open data concerned meteorological data. A range of EU countries already have this as open data, others not at all. While progress has been made in the past decades, the High Value Data list provides a blanket obligation for all EU MS, a result that would otherwise still be a very long time away if entirely voluntary for the MS involved.
Data included here includes all weather observation data, validated observations / climate data, radar data (useful for things like cloud heights, precipitation and wind), and numerical weather prediction data (these are the outputs of the combined models used for predictions).

The implementation act is up for public feedback until 21 June, but likely will retain its current form. I think it’s a pretty good result, and I am happy that I have been able to contribute to it.