US Press Admits Incompetence

Today is the day that enforcement of the GDPR, the new European data protection regulation starts. A novel part of the GDPR is that the rights of the individual described by the data follows the data. So if a US company collects my data, they are subject to the GDPR.

Compliance with the GDPR is pretty common sense, and not all that far from the data protection regulations that went before. You need to know which data you collect, have a proper reason why you collect it, have determined how long you keep data, and have protections in place to mitigate the risks of data exposure. On top of that you need to be able to demonstrate those points, and people described by your data have rights (to see what you know about them, to correct things or have data deleted, to export their data).

Compliance can be complicated if you don’t have your house fully in order, and need to do a lot of corrective steps to figure out what data you have, why you have it, whether it should be deleted and whether your protection measures are adequate enough.

That is why when the law entered into force on May 4th 2016, 2 years ago, a transition period was created in which no enforcement would take place. Those 2 years gave companies ample time to reach compliance, if they already weren’t.

The GDPR sets a de facto global norm and standard, as EU citizens data always falls under the GDPR, regardless where the data is located. US companies therefore need to comply as well when they have data about European people.

Today at the start of GDPR enforcement it turns out many US press outlets have not put the transition period to good use, although they have reported on the GDPR. They now block European IP addresses, while they ‘look at options’ to be available again to EU audiences.

From the east coast

to the west coast

In both cases the problem likely is how to deal with the 15 or so trackers those sites have that collect visitor data.

The LA Times for instance have previously reported on the GDPR, so they knew it existed.

A few days ago they asked their readers “Is your company ready?”, and last month they asked if the GDPR will help US citizens with their own privacy.

The LA Times own answers to that at the moment are “No” and “Not if you’re reading our newspaper”.

New PSI Directive Proposal: Overview and Comments

TL;DR

The European Commission proposed a new PSI Directive, that describes when and how publicly held data can be re-used by anyone (aka open government data). The proposal contains several highly interesting elements: it extends the scope to public undertakings (utilities and transport mostly) and research data, it limits the ways in which government can charge for data, introduces a high value data list which must be freely and openly available, mandates API’s, and makes de-facto exclusive arrangements transparant. It also calls for delegated powers for the EC to change practical details of the Directive in future, which opens interesting possibilities. In the coming months (years) it remains to be seen what the Member States and the European Parliament will do to weaken or strengthen this proposal.

Changes in the PSI Directive announced

On 25 April, the European Commission announced new measures to stimulate the European data economy, said to be building on the GDPR, as well as detailing the European framework for the free flow of non-personal data. The EC announced new guidelines for the sharing of scientific data, and for how businesses exchange data. It announced an action plan that increases safeguards on personal data related to health care and seeks to stimulate European cooperation on using this data. The EC also proposes to change the PSI Directive which governs the re-use of public sector information, commonly known as Open Government Data. In previous months the PSI Directive was evaluated (see an evaluation report here, in which my colleague Marc and I were involved)

This post takes a closer look at what the EC proposes for the PSI Directive. (I did the same thing when the last version was published in 2013)
This is of course a first proposal from the EC, and it may significantly change as a result of discussions with Member States and the European Parliament, before it becomes finalised and enters into law. Taking a look at the proposed new directive is of interest to see what’s new, what from an open data perspective is missing, and to see where debate with MS is most likely. Square bullets indicate the more interesting changes.

The Open Data yardstick

The original PSI Directive was adopted in 2003 and a revised version implemented in 2015. Where the original PSI Directive stems from well before the emergence of the Open Data movement, and was written with mostly ‘traditional’ and existing re-users of government information in mind, the 2015 revision already adopted some elements bringing it closer to the Open Definition. With this new proposal, again the yardstick is how it increases openness and sets minimum requirements that align with the open definition, and how much of it will be mandatory for Member States. So, scope and access rights, redress, charging and licensing, standards and formats are important. There are also some general context elements that stand out from the proposal.

A floor for the data-based society

In the recital for the proposal what jumps out is a small change in wording concerning the necessity of the PSI Directive. Where it used to say “information and knowledge” it now says “the evolution towards a data-based society influences the life of every citizen”. Towards the end of the proposal it describes the Directive as a means to improve the proper functioning of the European data economy, where it used to read ‘content industry’. The proposed directive lists minimum requirements for governments to provide data in ways that enable citizens and economic activity, but suggests Member States can and should do more, and not just stick with the floor this proposal puts in place.

Novel elements: delegated acts, public undertakings, dynamic data, high value data

There are a few novel elements spread out through the proposal that are of interest, because they seem intended to make the PSI Directive more flexible with an eye to the future.

  • The EC proposal ads the ability to create delegated acts. This would allow practical changes without the need to revise the PSI Directive and have it transposed into national law by each Member States. While this delegated power cannot be used to change the principles in the directive, it can be used to tweak it. Concerning charging, scope, licenses and formats this would provide the EC with more elbow room than the existing ability to merely provide guidance. The article is added to be able to maintain a list of ‘high value data sets’, see below.
  • Public undertakings are defined and mentioned in parallel to public sector bodies in each provision . Public undertakings are all those that are (in)directly owned by government bodies, significantly financed by them or controlled by them through regulation or decision making powers. It used to say only public sector, basically allowing governments to withdraw data from the scope of the Directive by putting them at a distance in a private entity under government control. While the scope is enlarged to include public undertakings in specific sectors only, the rest of the proposal refers to public undertakings in general. This is significant I think, given the delegated powers the EC also seeks.
  • Dynamic and real-time data is brought firmly in scope of the Directive. There have been court cases where data provision was refused on the grounds that the data did not exist when the request was made. That will no longer be possible with this proposal.
  • The EC wants to make a list of ‘high value datasets’ for which more things are mandatory (machine readable, API, free of charge, open standard license). It will create the list through the mentioned delegated powers. In my experience deciding on high value data sets is problematic (What value, how high? To whom?) and reinforces a supply-side perspective more over a demand driven approach. The Commission defines high value as “being associated with important socio-economic benefits” due to their suitability for creating services, and “the number of potential beneficiaries” of those services based on these data sets.

Access rights and scope

  • Public undertakings in specific sectors are declared within scope. These sectors are water, gas/heat, electricity, ports and airports, postal services, water transport and air transport. These public undertakings are only within scope in the sense that requests for re-use can be submitted to them. They are under no obligation to release data.
  • Research data from publicly funded research that are already made available e.g. through institution repositories are within scope. Member States shall adopt national policies to make more research data available.
  • A previous scope extension (museums, archives, libraries and university libraries) is maintained. For educational institutions a clarification is added that it only concerns tertiary education.
  • The proposed directive builds as before on existing access regimes, and only deals with the re-use of accessible data. This maintains existing differences between Member States concerning right to information.
  • Public sector bodies, although they retain any database rights they may have, cannot use those database rights to prevent or limit re-use.

Asking for documents to re-use, and redress mechanisms if denied

  • The way in which citizens can ask for data or the way government bodies can respond, has not changed
  • The redress mechanisms haven’t changed, and public undertakings, educational institutes research organisations and research funding organisations do not need to provide one.

Charging practices

  • The proposal now explicitly mentions free of charge data provision as the first option. Fees are otherwise limited to at most ‘marginal costs’
  • The marginal costs are redefined to include the costs of anonymizing data and protecting commercially confidential material. The full definition now reads “ marginal costs incurred for their reproduction, provision and dissemination and where applicable anonymisation of personal data and measures to protect commercially confidential information.” While this likely helps in making more data available, in contrast to a blanket refusal, it also looks like externalising costs on the re-user of what is essentially badly implemented data governance internally. Data holders already should be able to do this quickly and effectively for internal reporting and democratic control. Marginal costing is an important principle, as in the case of digital material it would normally mean no charges apply, but this addition seems to open up the definition to much wider interpretation.
  • The ‘marginal costs at most’ principle only applies to the public sector. Public undertakings and museum, archives etc. are excepted.
  • As before public sector bodies that are required (by law) to generate revenue to cover the costs of their public task performance are excepted from the marginal costs principle. However a previous exception for other public sector bodies having requirements to charge for the re-use of specific documents is deleted.
  • The total revenue from allowed charges may not exceed the total actual cost of producing and disseminating the data plus a reasonable return on investment. This is unchanged, but the ‘reasonable return on investment’ is now defined as at most 5 percentage points above the ECB fixed interest rate.
  • Re-use of research data and the high value data-sets must be free of charge. In practice various data sets that are currently charged for are also likely high value datasets (cadastral records, business registers for instance). Here the views of Member States are most likely to clash with those of the EC

Licensing

  • The proposal contains no explicit move towards open licenses, and retains the existing rules that standard license should be available, and those should not unnecessarily restrict re-use, nor restrict competition. The only addition is that Member States shall not only encourage public sector bodies but all data holders to use such standard licenses
  • High value data sets must have a license compatible with open standard licenses.

Non-discrimination and Exclusive agreements

  • Non-discrimination rules in how conditions for re-use are applied, including for commercial activities by the public sector itself, are continued
  • Exclusive arrangements are not allowed for public undertakings, as before for the public sector, with the same existing exceptions.
  • Where new exclusive rights are granted the arrangements now need to made public at least two months before coming into force, and the final terms of the arrangement need to be transparant and public as well.
  • Important is that any agreement or practical arrangement with third parties that in practice results in restricted availability for re-use of data other than for those third parties, also must be published two months in advance, and the final terms also made transparant and public. This concerns data sharing agreements and other collaborations where a few third parties have de facto exclusive access to data. With all the developments around smart cities where companies e.g. have access to sensor data others don’t, this is a very welcome step.

Formats and standards

  • Public undertakings will need to adhere to the same rules as the public sector already does: open standards and machine readable formats should be used for both documents and their metadata, where easily possible, but otherwise any pre-existing format and language is acceptable.
  • Both public sector bodies and public undertakings should provide API’s to dynamic data, either in real time, or if that is too costly within a timeframe that does not unduly impair the re-use potential.
  • High value data sets must be machine readable and available through an API

Let’s see how the EC takes this proposal forward, and what the reactions of the Member States and the European Parliament will be.

Suggested Reading: Barcelona, LETS, Freedom of Speech and more

Some links I thought worth reading the past few days

This Blog Is Now GDPR Compliant

At least I think it is…. Personal blogs don’t need to comply with the new European personal data protection regulations (already in force but enforceable from next week May 25th), says Article 2.2.c. However my blog does have a link with my professional activities, as I blog here about professional interests. One of those interests is data protection (the more you’re active in transparency and open data, the more you also start caring about data protection).

In the past few weeks Frank Meeuwsen has been writing about how to get his blog GDPR compliant (GDPR and the IndieWeb 1, 2 and 3, all in Dutch), and Peter Rukavina has been following suit. Like yours, my e-mail inbox is overflowing with GDPR related messages and requests from all the various web services and mailing lists I’m using. I had been thinking about adding a GDPR statement to this blog, but clearly needed a final nudge.

That nudge came this morning as I updated the Jetpack plugin of my WordPress blog. WordPress is the software I use to create this website, and Jetpack is a module for it, made by the same company that makes WordPress itself, Automattic. After the update, I got a pop-up stating that in my settings a new option now exists called “Privacy Policy”, which comes with a guide and suggested texts to be GDPR compliant. I was pleasantly surprised by this step by Automattic.

So I used that to write a data protection policy for this site. It is rather trivial in the sense that this website doesn’t do much, yet it is also surprisingly complicated as there are many different potential rabbit holes to go down. As it concerns not just comments or webmentions but also server logs my web hoster makes, statistics tools (some of which I don’t use but cannot switch off either), third party plugins for WordPress, embedded material from data hungry platforms like Youtube etc. I have a relatively bare bones blog (over the years I made it ever more minimalistic, stripping out things like sharing buttons most recently), and still as I’m asking myself questions that normally only legal departments would ask themselves, there are many aspects to consider. That is of course the whole point, that we ask these types of questions more often, not just of ourselves, but of every service provider we engage with.

The resulting Data Protection Policy is now available from the menu above.

It’s on! Smart Stuff That Matters Unconference

Elmine and I are happy to ‘officially’ announce the Smart Stuff That Matters (STM18) unconference!
Friday August 31st (conference), and Saturday September 1st (BBQ party) are the dates. Our home in Amersfoort is the location.

This 4th ‘Stuff That Matters’ conference will be in honor of Elmine’s 40th birthday. Let’s throw her and yourself a party to remember. It’s the smart thing to do 😉

Smart Stuff That Matters will be about us, the things we care about, and the tools and behaviour we think we need to shape our lives in a complex world and to respond locally to global challenges.

Smartness isn’t limited to technology, or to your ‘smart home’ filled with gadgets. What is smart in the context of your community, your family, and how you relate to your city, or the country you live in? What is the smartest way to tap into the global networks and knowledge we now have access to? Yet shield yourself against some of the cascading problems too?

What provides you and the people around you with meaningful ways to decide, learn, act and organise together? (the thing I call networked agency) What skills and digital literacies are needed for you to consider yourself a ‘smart citizen’?

How do we need to (re-)shape tools so they become active extensions of ourselves, within our own scope of control?
Some of the smartest technologies are actually ‘dumb’ in the sense that they are passive technologies. Other technologies billed as smart aren’t so much in practice, such as the eternal internet-connected fridge or sticking Amazon dash buttons all over your house.

The stuff that matters is not just technology but how we ourselves take action, as part of our communities and networks. Technology and different ways of doing things can help us and make us smarter.

Invitations will be coming soon
Smart Stuff That Matters is a by invitation only event. There is no attendance fee, but a donation box will be present. We will start sending out invitations in the coming week, so watch your inboxes! If you’d like to receive an invitation feel free to get in touch and let me know.

Find more info in the menu above under STM18.

Stay tuned!

#stm18

Although objectively speaking we were just in an overcrowded family home,
it felt like we were in a huge and spacious conference centre. …

The buzz of all those exciting and excited people
expressing and comparing their multitude of opinions,
made us literally forget where we were.
(Aldo about the 2010 event)

Suggested Reading: DNA, Reboot, Decentralisation and more

Some links I thought worth reading the past few days

Sentence Gradients with Neural Networks

Peter in his blog pointed to a fascinating posting by Robin Sloan about ‘sentence gradients’. His posting describes how he created a tool that make gradients out of text, much like the color gradients we know. It uses neural networks (neuronal networks we called them when I was at university). Neural networks, in other words machine learning, are used to represent texts as numbers (color gradients can be numbers e.g., on just one dimension. If you keep adding dimensions you can represent things that branch off in multiple directions as numbers too.) Sentences are more complex to represent numerically but if you can then it is possible, just like with colors, to find sentences that are numerically between a starting sentence and an ending sentence. Robin Sloan demonstrates the code for it in his blog (go there and try it!), and it creates fascinating results.

Mostly the results are fascinating I think because our minds are hardwired to determine meaning. So when we see a list of sentences we want, we need, we very much need, to find the intended meaning that turns that list into a text.

I immediately thought of other texts that are sometimes harder to fully grasp, but where you know or assume there must be deeper meaning: poems.

So I took a random poem from one of Elmine’s books, and entered the first and last sentence into the tool to make a sentence gradient.

The result was:

I think it is a marvellous coincidence that the word Ceremony comes up.
The original poem is by Thomas Hardy, and titled ‘Without Ceremony’. (Hardy died in 1928, so the poem is in the public domain and can be shown below)

Without Ceremony

It was your way, my dear,
To vanish without a word
When callers, friends, or kin
Had left, and I hastened in
To rejoin you, as I inferred.

And when you’d a mind to career
Off anywhere – say to town –
Your were all on a sudden gone
Before I had thought thereon
Or noticed your trunks were down.

So, now that you disappear
For ever in that swift style,
Your meaning seems to me
Just as it used to be:
‘Good-bye is not worth wile’

Searching Source of Canonical Landscape Reference

I remember once reading an article that if you ask people across cultures to draw their ideal landscape they all prefer the same elements: a woodland, bordering on a grass land, in which some large animal is visible. Water flowing. And a man-made structure.

Based on conversations earlier this week I am trying to find a reference to it. But I can’t find it. I think after initial searches, the right search term is canonical landscapes.

Do you have some notion as to where I should look?

Suggested Reading: Censorship, Fake News, Signal and more

Some links I thought worth reading the past few days

What Do You Automate?

Over the years there have been several things I’ve automated in my workflow. This week it was posting from Evernote to WordPress, saving me over 60 minutes per week. Years ago I automated starting a project, which saves me about 20 minutes each time I start a new project (of whatever type), by populating my various workflow tools with the right things for it. I use Android on my phone, and my ToDo application Things is Mac only, so at some point I wrote a little script that allows me to jot down tasks on my phone that then got send to Things. As Things now can process email that has become obsolete. I have also written tiny scripts that allow me to link to Evernote notes and Things items from inside other applications.

I’m still working to create a chat based script in my terminal that takes me through my daily starting routine, as well as my daily closing routine. This to take the ‘bookkeeping’ character away, and make it easier for me to for instance track a range of lead-indicators.

I know many others, like Peter Rukavina or Frank Meeuwsen also automate stuff for themselves, and if you search online the sheer range of examples you can find is enormous. Yet, I find there is much to learn from hearing directly from others what they automate, how and why it is important to them, as the context of where something fits in their workflow is crucial information.

What are the things you automate? Apart from the the full-on techie things, like to start a new virtual server on Amazon, I mean. The more mundane day to day things in your workflow, above key board shortcuts? And have you published how you do that somewhere online?