At State of the Net 2018 in Trieste Hossein Derakshan (h0d3r on Twitter) talked about journalism and its future. Some of his statements stuck with me in the past weeks so yesterday I took time to watch the video of his presentation again.

In his talk he discussed the end of news. He says that discussions about the erosion of business models in the news business, quality of news, trust in sources and ethics are all side shows to a deeper shift. A shift that is both cultural and social. News is a two century old format, representative of the globalisation of communications with the birth of the telegraph. All of a sudden events from around the globe were within your perspective, and being informed made you “a man of the world”. News also served as a source of drama in our lives. “Did you hear,…”. These days those aspects of globalisation, time and drama have shifted.
Local, hyperlocal, has become more important again at the cost of global perspectives, which Hossein sees taking place in things like buying local, but also in Facebook to keep up with the lives of those around you. Similarly identity politics reduces the interest in other events to those pertaining to your group. Drama shifted away from news to performances and other media (Trumps tweets, memes, our representation on social media platforms). News and time got disentangled. Notifications and updates come at any time from any source, and deeper digging content is no longer tied to the news cycle. Journalism like the Panama Papers takes a long time to produce, but can also be published at any time without that having an impact on its value or reception.

News and journalism have become decoupled. News has become a much less compelling format, and in the words of Derakshan is dying if not dead already. With the demise of text and reason and the rise of imagery and emtions, the mess that journalism is in, what formats can journalism take to be all it can be?

Derakshan points to James Carey who said Democracy and Journalism are the same thing, as they are both defined as public conversation. Hossein sees two formats in which journalism can continue. One is literature, long-form non-fiction. This can survive away from newspapers and magazines, both online and in the form of e.g. books. Another is cinema. There’s a rise in documentaries as a way to bring more complex stories to audiences, which also allows for conveying of drama. It’s the notion of journalism as literature that stuck with me most at State of the Net.

For a number of years I’ve said that I don’t want to pay for news, but do want to pay for (investigative) journalism, and often people would respond news and journalism are the same thing. Maybe I now finally have the vocabulary to better explain the difference I perceive.

I agree that the notion of public conversation is of prime importance. Not the screaming at each-other on forums, twitter or facebook. But the way that distributed conversations can create learning, development and action, as a democratic act. Distributed conversations, like the salons of old, as a source of momentum, of emergent collective action (2013). Similarly, I position Networked Agency as a path away from despair of being powerless in the face of change, and therefore as an alternative to falling for populist oversimplification. Networked agency in that sense is very much a democratising thing.

Today I contributed to a session of the open data research groups at Delft University. They do this a few times per year to discuss ongoing research and explore emerging questions that can lead to new research. I’ve taken part a few times in the past, and this time they asked me to provide an overview of what I see as current developments.

Some of the things I touched upon are similar to the remarks I made in Serbia during Open Data Week in Belgrade. The new PSI Directive proposal also was on the menu. I ended with the questions I think deserve attention. They are either about how to make sure that abstract norms get translated to the very practical, and to the local level inside government, or how to ensure that critical elements get connected and visibly stay that way (such as links between regular policy goals / teams and information management)

The slides are embedded below.

Iryna Susha and Bastiaan van Loenen in the second part of our afternoon took us through their research into the data protection steps that are in play in data collaboratives. This I found very worthwile, as data governance issues of collaborative groups (e.g. public and private entities around energy transition) are regularly surfacing in my work. Both where it threatens data sovereignty for instance, or where collaboratively pooled data can hardly be shared because it has become impossible to navigate the contractual obligations connected to the data that was pooled.

Yesterday at State of the Net I showed some of the work I did with the great Frysklab team, letting a school class find power in creating their own solutions. We had a I think very nicely working triade of talks in our session, Hossein Derakshan first, me in the middle, and followed by Dave Snowden. In his talk, Dave referenced my preceding one, saying it needed scaling for the projects I showed to alter anything. Although I know Dave Snowden didn’t mean his call for scale that way, often when I hear it, it is rooted in the demand-for-ever-more-growth type of systems we know cannot be sustained in a closed world system like earth’s. The small world syndrom, as I named it at Shift 2010, will come biting.

It so often also assumes there needs to be one person or entity doing the scaling, a scaler. Distributed networks don’t need a scaler per se.
The internet was not created that way, nor was the Web. Who scaled RSS? Some people moved it forwards more than others, for certain, but unconnected people, just people recognising a possibility to fruitfully build on others for something they felt personally needed. Dave Winer spread it with Userland, made it more useful, and added the possibility of having the payload be something else than just text, have it be podcasts. We owe him a lot for the actual existence of this basic piece of web plumbing. Matt Mullenweg of WordPress and Ben and Mena Trott of Movable Type helped it forward by adding RSS to their blogging tools, so people like me could use it ‘out of the box’. But it actually scaled because bloggers like me wanted to connect. We recognised the value of making it easy for others to follow us, and for us to follow the writings of others. So I and others created our own templates, starting from copying something someone else already made and figuring out how to use RSS. It is still how I adopt most of my tools. Every node in a network is a scaler, by doing something because it is of value to themselves in the moment, changes them, and by extension adding themselves to the growing number of nodes doing it. Some nodes may take a stronger interest in spreading something, convincing others to adopt something, but that’s about it. You might say the source of scaling is the invisible hand of networks.

That’s why I fully agree with Chris Hardie that in the open web, all the tools you create need to have the potentiality of the network effect built in. Of course, when something is too difficult for most to copy or adapt, then there won’t be this network effect. Which is why most of the services we see currently dominating online experiences, the ones that shocked Hossein upon returning from his awful forced absence, are centralised services made very easy to use. Where someone was purposefully aiming for scale, because their business depended on it once they recognised their service had the potential to scale.

Dave Winer yesterday suggested the blogosphere is likely bigger now than when it was so dominantly visible in the ‘00s, when your blogpost of today could be Google’s top hit for a specific topic, when I could be found just on my first name. But it is so much less visible than before, precisely because it is not centralised, and the extravagant centralised silos stand out so much. The blogosphere diminished itself as well however, Dave Winer responded to Hossein Derakshan’s talk yesterday.

People still blog, more people blog than before, but we no longer build the same amount of connections across blogs. Connections we were so in awe of when our writing first proved to have the power to create them. Me and many others, bloggers all, suckered ourselves into feeling blog posts needed to be more like reporting, essays, and took our conversations to the comments on Facebook. Facebook, which, as Hossein Derakshan pointed out, make such a travesty of what web links are by allowing them only as separate from the text you write on Facebook. It treats all links as references to articles, not allowing embedding them in the text, or allowing more than one link to be presented meaningfully. That further reinforced the blog-posts-as-articles notions. That further killed the link as weaving a web of distributed conversations, a potential source of meaning. Turned the web, turned your timeline, into TV, as Hossein phrased it.

Hoder on ‘book-internet’ (blogs) and ‘tv-internet’ (FB et al) Tweet by Anna Masera

I switched off my tv ages ago. And switched off my FB tv-reincarnate nine months ago. In favour of allowing myself more time to write as thinking out loud, to have conversations.

Adriana Lukas and I after the conference, as we sat there enjoying an Italian late Friday afternoon over drinks, talked about the Salons of old. How we both have created through the years settings like that, Quantified Self meetings, BlogWalks, Birthday Unconferences, and how we approached online sharing like that too. To just add some of my and your ramblings to the mix. Starting somewhere in the middle, following a few threads of thought and intuitions, adding a few links (as ambient humanity), and ending without conclusions. Open ended. Just leaving it here.

At State of the Net yesterday I used the concept of macroscopes. I talked about how many people don’t really feel where their place is in the face of global changes, like climate change, ageing, the pressures on rules and institutions, the apparent precarity of global financial systems. That many feel whatever their actions, they will not have influence on those changes. That many feel so much of the change around them is being done to them, merely happens to them, like the weather.
Macroscopes provide a perspective that may address such feelings of being powerless, and helps us in the search for meaning.

Macroscopes, being the opposite of microscopes, allow us to see how our personal situation fits in a wider global whole. The term comes from John Thackara in the context of social end ecological design. He says a macroscope “allows us to see what the aggregation of many small interactions looks like when added together”. It makes the processes and systems that surrounds us visible and knowable.

I first encountered the term macroscope at the 2009 Reboot conference in Copenhagen where Matt Webb in his opening keynote invoked Thackara.
Matt Webb also rephrased what a macroscope is, and said “a macroscope shows you where you are, and where within something much bigger, simultaneously. To understand something much bigger than you in a human way, at human scale, in your heart.” His way of phrasing it stayed with me in the past years. I like it very much because it adds human emotion to the concept of macroscopes. It provides us with a place we feel we have, a sense of meaning. As meaning is deeply emotional.

Chuck Close self portrait at Drents Museum
Seeing the small …

Chuck Close self portrait at Drents Museum
and the bigger picture simultaneously. (Chuck Close self portrait 1995, at Drents Museum)

Later in his on stage conversation at State of the Net, Dave Winer remarked that for Donald Trump’s base MAGA is such a source of meaning, and I think he’s right. Even though it’s mostly an expression of hope that I typified in my talk as salvationism. (Someone will come along and make everything better, a populist, an authoritarian, a deity, or speakers pontificating on stage.) I’ve encountered macroscopes that worked for people in organisations. But sometimes they can appear very contrived viewed from the outside. The man who cleans the urinals at an airport and says he’s ensuring 40 million people per year have a pleasant and safe trip, clearly is using a macroscope effectively. It’s one I can empathise with as aiming for great hospitality, but it also feels a bit contrived as many other things at an airport, such as the cattle prodding at security and the leg room on your plane so clearly don’t chime with it. In the Netherlands I encountered two examples of working macroscopes. Everyone I encountered at the Court of Audit reflexively compares every idea and proposal to the way their institution’s role is described in the constitution. Not out of caution, but out of feeling a real sense of purpose as working on behalf of the people to check how government spends its money. The other one was the motto of the government engineering department responsible for water works and coastal defences, “Keeping our feet dry”. With so much of our country below sea level, and the catastrophic floods of 1953 seared in our collective memory, it’s a highly evocative macroscope that draws an immediate emotional response. They since watered it down, and now it’s back to something bloodless and bland, likely resulting from a dreary mission statement workshop.

In my talk I positioned networked agency as a macroscope. Globe spanning digital networks and our human networks in my mind are very similar in the way they behave, and hugely overlapping. So much so they can be treated as one, we should think in terms of human digital networks. There is meaning, the deeply felt kind of meaning, to be found in doing something together with a group. There’s also a tremendous sense of power to be felt from the ability to solve something for yourself as a group. Seeing your group as part, as a distinctive node or local manifestation, of the earth-wide human digital network allows you to act in your own way as part of global changes, and see the interdependencies. That also let’s you see how to build upon the opportunities that emerge from the global network, while being able to disconnect or shield yourself from negative things propagating over the network. Hence my call to build tools (technologies and methods) that are useful on their own within a group, as a singular instance, but more useful when federated with other instances across the global network. Tools shaped like that mean no-one but the group using it itself can switch their tools off, and the group can afford to disconnect from the wider whole on occasion.

Today I am enjoying the 2018 edition of the State of the Net conference, in Italy. Organised by Beniamino Pagliaro, Paolo Valdemarin and Sergio Maistrello.

sotn2018
Beniamino Pagliaro opening the conference this morning

This morning I provided a key note on Networked Agency, where I talked about rediscovering our ability to act. As networked groups, in real and meaningful contexts as the unit of agency. For that to be possible our tools, both technologies and methods, need to work for groups, be much easier to access. They also need to work both as a local instance as well as federated across contexts. From it striking power (classic agency) flows, agility to use and leverage the useful things coming at us over the networks, and resilience to mitigate the negative consequences that come at us over those same networks.

The slides are below.

[UPDATE]
The videos of State of the Net are online, including the video of my talk.


[/UPDATE]

(Disclosure) Paolo is a long time friend and I had the privilege of contributing to previous editions in 2012 (Trieste) and 2015 (Milano). I’m also a member of the conference’s steering committee.

Earlier this month I asked Frank Meeuwsen a question, about his rss feeds, and he responded (in Dutch). He did so as a direct response (hey Ton!). He referenced a posting by James Shelley who suggests writing postings in the second person, as open correspondence really.

I definitely see blogs as distributed conversations. You write something, I may respond on my own blog, such as now. That response may either directly engage your post or may go off on a tangent, or weaves it into a broader conversation by pointing to other blog posts from other authors. It’s not the original use case why I started blogging, that was ‘thinking out loud’, but conversations definitely is the use case why I kept doing it for over 15 years now.

I also always treat blogs as the personal voices of their authors. Unless it’s an online magazine format, like Ars Technica for instance. In my feed reader I therefore always add the name of the author first. I’m not following publications or channels, I am reading what individual people write. Over time from that reading interaction and then connection may well flow. That’s also why I order my feeds roughly on social distance. Those closest to me I will check daily, those further away I may check less, depending on time or on having a specific question where I’m curious what others may write about that.


part of my reading lists: persons not publications

James’ suggestion I both like and feel slightly uncomfortable with. Like, because it is aimed at making blogs distributed conversations, which is a core purpose of my blog. Getting away from feeling like you’re writing a news article, striking a more informal tone, definitely helps. It likely is also a good way to blog more. A while ago when I asked my network what to write about more, one of the suggestions (by Georges Labreche) was to write an epistolary travel log novella. This, as my blog would actually provide all the material, with all the links to other blogs and authors in my postings. In James’ suggestion the blog itself would already be that epistolary travel log. My blog in that sense is too, just the form of address is different.

The discomfort is probably caused by also wanting to maintain a permanent open invitation to others to join in. To not exclusively address something to someone, and not discourage others lurking from contributing. Usually I am aware of others that are likely to have a perspective to add.
Another factor is supplying enough context. I agree that the first paragraph which allows you to follow the context of this posting as part of a conversation feels contrived and is ‘dry’ to read. Yet, I feel it is necessary to convey in some form. In a second person form this context would likely be left out completely as the counterpart already knows the context. That makes it harder to follow for those who have just one side of the conversations as their window on it.

But yes, let’s build more conversations, James and Frank. I’m definitely with you both on that.

Some links I thought worth reading the past few days

A good quote from Thomas Madsen Mygdal

4 billion dollar ico yesterday.
Seen a generation and tech with big potential end up in ipo games, greed, speculation and short term thinking – “saw the beast of capitalism directly in it’s eyes” is my mental image of the dotcom bubble.
A natural consequence of any technology cycle I rationally know, but just sad to see generations repeating previous mistakes over and over.

In the comments Martin von Haller Grønbaek points to what happened after the dotcom bubble burst, a tremendous wave of innovation. So blockchain tech is set to blossom after the impending ICO crash.

To celebrate the launch of the GDPR last week Friday, Jaap-Henk Hoekman released his ‘little blue book’ (pdf)’ on Privacy Design Strategies (with a CC-BY-NC license). Hoekman is an associate professor with the Digital Security group of the ICS department at the Radboud University.

I heard him speak a few months ago at a Tech Solidarity meet-up, and enjoyed his insights and pragmatic approaches (PDF slides here).

Data protection by design (together with a ‘state of the art’ requirement) forms the forward looking part of the GDPR where the minimum requirements are always evolving. The GDPR is designed to have a rising floor that way.
The little blue book has an easy to understand outline, which cuts up doing privacy by design into 8 strategies, each accompanied by a number of tactics, that can all be used in parallel.

Those 8 strategies (shown in the image above) are divided into 2 groups, data oriented strategies and process oriented strategies.

Data oriented strategies:
Minimise (tactics: Select, Exclude, Strip, Destroy)
Separate (tactics: Isolate, Distribute)
Abstract (tactics: Summarise, Group, Perturb)
Hide (tactics: Restrict, Obfuscate, Dissociate, Mix)

Process oriented strategies:
Inform (tactics: Supply, Explain, Notify)
Control (tactics: Consent, Choose, Update, Retract)
Enforce (tactics: Create, Maintain, Uphold)
Demonstrate (tactics: Record, Audit, Report)

All come with examples and the final chapters provide suggestions how to apply them in an organisation.

Some links I thought worth reading the past few days

Today I was at a session at the Ministry for Interior Affairs in The Hague on the GDPR, organised by the center of expertise on open government.
It made me realise how I actually approach the GDPR, and how I see all the overblown reactions to it, like sending all of us a heap of mail to re-request consent where none’s needed, or taking your website or personal blog even offline. I find I approach the GDPR like I approach a quality assurance (QA) system.

One key change with the GDPR is that organisations can now be audited concerning their preventive data protection measures, which of course already mimics QA. (Next to that the GDPR is mostly an incremental change to the previous law, except for the people described by your data having articulated rights that apply globally, and having a new set of teeth in the form of substantial penalties.)

AVG mindmap
My colleague Paul facilitated the session and showed this mindmap of GDPR aspects. I think it misses the more future oriented parts.

The session today had three brief presentations.

In one a student showed some results from his thesis research on the implementation of the GDPR, in which he had spoken with a lot of data protection officers or DPO’s. These are mandatory roles for all public sector bodies, and also mandatory for some specific types of data processing companies. One of the surprising outcomes is that some of these DPO’s saw themselves, and were seen as, ‘outposts’ of the data protection authority, in other words seen as enforcers or even potentially as moles. This is not conducive to a DPO fulfilling the part of its role in raising awareness of and sensitivity to data protection issues. This strongly reminded me of when 20 years ago I was involved in creating a QA system from scratch for my then employer. Some of my colleagues saw the role of the quality assurance manager as policing their work. It took effort to show how we were not building a straightjacket around them that kept them within strict boundaries, but providing a solid skeleton to grow on, and move faster. Where audits are not hunts for breaches of compliance but a way to make emergent changes in the way people worked visible, and incorporate professionally justified ones in that skeleton.

In another presentation a civil servant of the Ministry involved in creating a register of all person related data being processed. What stood out most for me was the (rightly) pragmatic approach they took with describing current practices and data collections inside the organisation. This is a key element of QA as well. You work from descriptions of what happens, and not at what ’should’ happen or ‘ideally’ happens. QA is a practice rooted in pragmatism, where once that practice is described and agreed it will be audited.
Of course in the case of the Ministry it helps that they only have tasks mandated by law, and therefore the grounds for processing are clear by default, and if not the data should not be collected. This reduces the range of potential grey areas. Similarly for security measures, they already need to adhere to national security guidelines (called the national baseline information security), which likewise helps with avoiding new measures, proves compliance for them, and provides an auditable security requirement to go with it. This no doubt helped them to be able to take that pragmatic approach. Pragmatism is at the core of QA as well, it takes its cues from what is really happening in the organisation, what the professionals are really doing.

A third one dealt with open standards for both processes and technologies by the national Forum for Standardisation. Since 2008 a growing list of currently some 40 or so standards is mandatory for Dutch public sector bodies. In this list of standards you find a range of elements that are ready made to help with GDPR compliance. In terms of support for the rights of those described by the data, such as the right to export and portability for instance, or in terms of preventive technological security measures, and ‘by design’ data protection measures. Some of these are ISO norms themselves, or, as the mentioned national baseline information security, a compliant derivative of such ISO norms.

These elements, the ‘police’ vs ‘counsel’ perspective on the rol of a DPO, the pragmatism that needs to underpin actions, and the building blocks readily to be found elsewhere in your own practice already based on QA principles, made me realise and better articulate how I’ve been viewing the GDPR all along. As a quality assurance system for data protection.

With a quality assurance system you can still famously produce concrete swimming vests, but it will be at least done consistently. Likewise with GDPR you will still be able to do all kinds of things with data. Big Data and developing machine learning systems are hard but hopefully worthwile to do. With GDPR it will just be hard in a slightly different way, but it will also be helped by establishing some baselines and testing core assumptions. While making your purposes and ways of working available for scrutiny. Introducing QA upon its introduction does not change the way an organisation works, unless it really doesn’t have its house in order. Likewise the GDPR won’t change your organisation much if you have your house in order either.

From the QA perspective on GDPR, it is perfectly clear why it has a moving baseline (through its ‘by design’ and ‘state of the art’ requirements). From the QA perspective on GDPR it is perfectly clear what the connection is to how Europe is positioning itself geopolitically in the race concerning AI. The policing perspective after all only leads to a luddite stance concerning AI, which is not what the EU is doing, far from it. From that it is clear how the legislator intends the thrust of GDPR. As QA really.

Today is the day that enforcement of the GDPR, the new European data protection regulation starts. A novel part of the GDPR is that the rights of the individual described by the data follows the data. So if a US company collects my data, they are subject to the GDPR.

Compliance with the GDPR is pretty common sense, and not all that far from the data protection regulations that went before. You need to know which data you collect, have a proper reason why you collect it, have determined how long you keep data, and have protections in place to mitigate the risks of data exposure. On top of that you need to be able to demonstrate those points, and people described by your data have rights (to see what you know about them, to correct things or have data deleted, to export their data).

Compliance can be complicated if you don’t have your house fully in order, and need to do a lot of corrective steps to figure out what data you have, why you have it, whether it should be deleted and whether your protection measures are adequate enough.

That is why when the law entered into force on May 4th 2016, 2 years ago, a transition period was created in which no enforcement would take place. Those 2 years gave companies ample time to reach compliance, if they already weren’t.

The GDPR sets a de facto global norm and standard, as EU citizens data always falls under the GDPR, regardless where the data is located. US companies therefore need to comply as well when they have data about European people.

Today at the start of GDPR enforcement it turns out many US press outlets have not put the transition period to good use, although they have reported on the GDPR. They now block European IP addresses, while they ‘look at options’ to be available again to EU audiences.

From the east coast

to the west coast

In both cases the problem likely is how to deal with the 15 or so trackers those sites have that collect visitor data.

The LA Times for instance have previously reported on the GDPR, so they knew it existed.

A few days ago they asked their readers “Is your company ready?”, and last month they asked if the GDPR will help US citizens with their own privacy.

The LA Times own answers to that at the moment are “No” and “Not if you’re reading our newspaper”.

TL;DR

The European Commission proposed a new PSI Directive, that describes when and how publicly held data can be re-used by anyone (aka open government data). The proposal contains several highly interesting elements: it extends the scope to public undertakings (utilities and transport mostly) and research data, it limits the ways in which government can charge for data, introduces a high value data list which must be freely and openly available, mandates API’s, and makes de-facto exclusive arrangements transparant. It also calls for delegated powers for the EC to change practical details of the Directive in future, which opens interesting possibilities. In the coming months (years) it remains to be seen what the Member States and the European Parliament will do to weaken or strengthen this proposal.

Changes in the PSI Directive announced

On 25 April, the European Commission announced new measures to stimulate the European data economy, said to be building on the GDPR, as well as detailing the European framework for the free flow of non-personal data. The EC announced new guidelines for the sharing of scientific data, and for how businesses exchange data. It announced an action plan that increases safeguards on personal data related to health care and seeks to stimulate European cooperation on using this data. The EC also proposes to change the PSI Directive which governs the re-use of public sector information, commonly known as Open Government Data. In previous months the PSI Directive was evaluated (see an evaluation report here, in which my colleague Marc and I were involved)

This post takes a closer look at what the EC proposes for the PSI Directive. (I did the same thing when the last version was published in 2013)
This is of course a first proposal from the EC, and it may significantly change as a result of discussions with Member States and the European Parliament, before it becomes finalised and enters into law. Taking a look at the proposed new directive is of interest to see what’s new, what from an open data perspective is missing, and to see where debate with MS is most likely. Square bullets indicate the more interesting changes.

The Open Data yardstick

The original PSI Directive was adopted in 2003 and a revised version implemented in 2015. Where the original PSI Directive stems from well before the emergence of the Open Data movement, and was written with mostly ‘traditional’ and existing re-users of government information in mind, the 2015 revision already adopted some elements bringing it closer to the Open Definition. With this new proposal, again the yardstick is how it increases openness and sets minimum requirements that align with the open definition, and how much of it will be mandatory for Member States. So, scope and access rights, redress, charging and licensing, standards and formats are important. There are also some general context elements that stand out from the proposal.

A floor for the data-based society

In the recital for the proposal what jumps out is a small change in wording concerning the necessity of the PSI Directive. Where it used to say “information and knowledge” it now says “the evolution towards a data-based society influences the life of every citizen”. Towards the end of the proposal it describes the Directive as a means to improve the proper functioning of the European data economy, where it used to read ‘content industry’. The proposed directive lists minimum requirements for governments to provide data in ways that enable citizens and economic activity, but suggests Member States can and should do more, and not just stick with the floor this proposal puts in place.

Novel elements: delegated acts, public undertakings, dynamic data, high value data

There are a few novel elements spread out through the proposal that are of interest, because they seem intended to make the PSI Directive more flexible with an eye to the future.

  • The EC proposal ads the ability to create delegated acts. This would allow practical changes without the need to revise the PSI Directive and have it transposed into national law by each Member States. While this delegated power cannot be used to change the principles in the directive, it can be used to tweak it. Concerning charging, scope, licenses and formats this would provide the EC with more elbow room than the existing ability to merely provide guidance. The article is added to be able to maintain a list of ‘high value data sets’, see below.
  • Public undertakings are defined and mentioned in parallel to public sector bodies in each provision . Public undertakings are all those that are (in)directly owned by government bodies, significantly financed by them or controlled by them through regulation or decision making powers. It used to say only public sector, basically allowing governments to withdraw data from the scope of the Directive by putting them at a distance in a private entity under government control. While the scope is enlarged to include public undertakings in specific sectors only, the rest of the proposal refers to public undertakings in general. This is significant I think, given the delegated powers the EC also seeks.
  • Dynamic and real-time data is brought firmly in scope of the Directive. There have been court cases where data provision was refused on the grounds that the data did not exist when the request was made. That will no longer be possible with this proposal.
  • The EC wants to make a list of ‘high value datasets’ for which more things are mandatory (machine readable, API, free of charge, open standard license). It will create the list through the mentioned delegated powers. In my experience deciding on high value data sets is problematic (What value, how high? To whom?) and reinforces a supply-side perspective more over a demand driven approach. The Commission defines high value as “being associated with important socio-economic benefits” due to their suitability for creating services, and “the number of potential beneficiaries” of those services based on these data sets.

Access rights and scope

  • Public undertakings in specific sectors are declared within scope. These sectors are water, gas/heat, electricity, ports and airports, postal services, water transport and air transport. These public undertakings are only within scope in the sense that requests for re-use can be submitted to them. They are under no obligation to release data.
  • Research data from publicly funded research that are already made available e.g. through institution repositories are within scope. Member States shall adopt national policies to make more research data available.
  • A previous scope extension (museums, archives, libraries and university libraries) is maintained. For educational institutions a clarification is added that it only concerns tertiary education.
  • The proposed directive builds as before on existing access regimes, and only deals with the re-use of accessible data. This maintains existing differences between Member States concerning right to information.
  • Public sector bodies, although they retain any database rights they may have, cannot use those database rights to prevent or limit re-use.

Asking for documents to re-use, and redress mechanisms if denied

  • The way in which citizens can ask for data or the way government bodies can respond, has not changed
  • The redress mechanisms haven’t changed, and public undertakings, educational institutes research organisations and research funding organisations do not need to provide one.

Charging practices

  • The proposal now explicitly mentions free of charge data provision as the first option. Fees are otherwise limited to at most ‘marginal costs’
  • The marginal costs are redefined to include the costs of anonymizing data and protecting commercially confidential material. The full definition now reads “ marginal costs incurred for their reproduction, provision and dissemination and where applicable anonymisation of personal data and measures to protect commercially confidential information.” While this likely helps in making more data available, in contrast to a blanket refusal, it also looks like externalising costs on the re-user of what is essentially badly implemented data governance internally. Data holders already should be able to do this quickly and effectively for internal reporting and democratic control. Marginal costing is an important principle, as in the case of digital material it would normally mean no charges apply, but this addition seems to open up the definition to much wider interpretation.
  • The ‘marginal costs at most’ principle only applies to the public sector. Public undertakings and museum, archives etc. are excepted.
  • As before public sector bodies that are required (by law) to generate revenue to cover the costs of their public task performance are excepted from the marginal costs principle. However a previous exception for other public sector bodies having requirements to charge for the re-use of specific documents is deleted.
  • The total revenue from allowed charges may not exceed the total actual cost of producing and disseminating the data plus a reasonable return on investment. This is unchanged, but the ‘reasonable return on investment’ is now defined as at most 5 percentage points above the ECB fixed interest rate.
  • Re-use of research data and the high value data-sets must be free of charge. In practice various data sets that are currently charged for are also likely high value datasets (cadastral records, business registers for instance). Here the views of Member States are most likely to clash with those of the EC

Licensing

  • The proposal contains no explicit move towards open licenses, and retains the existing rules that standard license should be available, and those should not unnecessarily restrict re-use, nor restrict competition. The only addition is that Member States shall not only encourage public sector bodies but all data holders to use such standard licenses
  • High value data sets must have a license compatible with open standard licenses.

Non-discrimination and Exclusive agreements

  • Non-discrimination rules in how conditions for re-use are applied, including for commercial activities by the public sector itself, are continued
  • Exclusive arrangements are not allowed for public undertakings, as before for the public sector, with the same existing exceptions.
  • Where new exclusive rights are granted the arrangements now need to made public at least two months before coming into force, and the final terms of the arrangement need to be transparant and public as well.
  • Important is that any agreement or practical arrangement with third parties that in practice results in restricted availability for re-use of data other than for those third parties, also must be published two months in advance, and the final terms also made transparant and public. This concerns data sharing agreements and other collaborations where a few third parties have de facto exclusive access to data. With all the developments around smart cities where companies e.g. have access to sensor data others don’t, this is a very welcome step.

Formats and standards

  • Public undertakings will need to adhere to the same rules as the public sector already does: open standards and machine readable formats should be used for both documents and their metadata, where easily possible, but otherwise any pre-existing format and language is acceptable.
  • Both public sector bodies and public undertakings should provide API’s to dynamic data, either in real time, or if that is too costly within a timeframe that does not unduly impair the re-use potential.
  • High value data sets must be machine readable and available through an API

Let’s see how the EC takes this proposal forward, and what the reactions of the Member States and the European Parliament will be.

Some links I thought worth reading the past few days