Amazon has been fined 746 million Euro by the Luxembourg DPA (where Amazon’s EU activities reside). In its response Amazon shows it isn’t willing to publicly acknowledge to even understand the EU data protection rules.

There has been no data breach, and no customer data has been exposed to any third party. These facts are undisputed., said an Amazon spokesperson according to Techcrunch.

Those facts are of course undisputed because a data breach or exposure of data to third parties is not a prerequisite for being in breach of GDPR rules. Using the data yourself in ways that aren’t allowed is plenty reason in itself for fines the size of a few percentage points of your global yearly turnover. In Amazon’s case the fine isn’t even a third of a percentage point of their turnover, so about a day’s worth of turnover for them: they’re being let-off pretty lightly actually compared to what is possible under the GDPR.

How Amazon uses the data it collects, not any breach or somesuch, is the actual reason for the complaint by La Quadrature du Net (PDF) filed with the Luxembourg DPA: the complaint “alleges that Amazon manipulates customers for commercial means by choosing what advertising and information they receive.” (emphasis mine)

The complaint and the ruling are laying bare the key fact Amazon and other tech companies aren’t willing to publicly comment upon: adtech in general is in breach of the GDPR.

There are a range of other complaints along these lines being processed by various DPA’s in the EU, though for some of those it will be a long wait as e.g. the Irish DPA is working at a snail’s pace w.r.t. complaints against Apple and Facebook. (The slow speed of the Irish DPA is itself now the subject of a complaint.)

Meanwhile two new European laws have been proposed that don’t chime with the current modus operandi of Amazon et al, the Digital Markets Act and the Digital Services Act, which both contain still bigger potential fines than the GDPR for non-compliance w.r.t. e.g. interoperability, service-neutrality, and transparency and accountability measures. And of course there are the European anti-trust charges against Amazon as well.

Amazon will of course appeal, but it can only ever be an attempt to gaslight and gloss over the fundamental conflict between adtech and GDPR. Let’s hope the Luxembourg DPA continues to see through that.

Since the start of this year I am actively tracking the suite of new European laws being proposed on digitisation and data. Together they are the expression into law of the geopolitical position the EU is taking on everything digital and data, and all the proposed laws follow the same logic and reasoning. Taken together they shape how Europe wants to use the potential and benefits of digitisation and data use, including specifically for a range of societal challenges, while defending and strengthening citizen rights. Of course other EU legal initiatives in parallel sometimes point in different directions (e.g. EU copyright regulations leading to upload filters, and the attempts at backdooring end-to-end encryption in messaging apps for mass surveillance), but that is precisely why to me this suite of regulations stands out. Where other legal initiatives often seem to stand on their own, and bear the marks of lobbying and singular industry interests, this group of measures all build on the same logic and read internally consistent as well as an expression of an actual vision.

My work is to help translate the proposed legal framework to how it will impact and provide opportunity to large Dutch government data holders and policy departments, and to build connections and networks between all kinds of stakeholders around relevant societal issues and related use cases. This to shape the transition from the data provision oriented INSPIRE program (sharing and harmonising geo-data across the EU), to a use needs and benefits oriented approach (reasoning from a societal issue to solve towards with a network of relevant parties towards the data that can provide agency for reaching a solution). My work follows directly from the research I did last year to establish a list of EU wide high value data sets to be opened, where I dived deeply into all government data and its governance concerning earth observation, environment and meteorology, while other team members did the same for geo-data, statistics, company registers, and mobility.

All the elements in the proposed legal framework will be decided upon in the coming year or so, and enter into force probably after a 2 year grace period. So by 2025 this should be in place. In the meantime many organisations, as well as public funding, will focus on already implementing elements of it even while nothing is mandatory yet. As with the GDPR, the legal framework once in place will also be an export mechanism of the notions and values expressed in it to the rest of the world. This as compliance is tied to EU market access and having EU citizens as clients wherever they are.

One element of the framework is already in place, the GDPR. The newly proposed elements mimic the fine structures of the GDPR for non-compliance.
The new elements take the EU Digital Compass and EU Digital Rights and Principles for which a public consultation is now open until 2 September as a starting point.

The new proposed laws are:

Digital Markets Act (download), which applies to all dominant market parties, in terms of platform providers as well as physical network providers, that de facto are gatekeepers to access by both citizens and market entities. It aims for a digital unified market, and sets requirements for interoperability, ‘service neutrality’ of platforms, and to prevent lock-in. Proposed in November 2020.

Digital Services Act (download), applies to both gatekeepers (see previous point) and other digital service providers that act as intermediaries. Aims for a level playing field and diversity of service providers, protection of citizen rights, and requires transparency and accountability mechanisms. Proposed in November 2020.

AI Regulatory Proposal (download), does not regulate AI technology, but the EU market access of AI applications and usage. Market access is based on an assessment of risk to citizen rights and to safety (think of use in vehicles etc). It’s a CE mark for AI. It periodically updates a list of technologies considered within scope, and a list of areas that count as high risk. With increasing risk more stringent requirements on transparency, accountability and explainability are set. Creates GDPR style national and European authorities for complaints and enforcement. Responsibilities are given to the producer of an application, distributors as well as users of such an application. It’s the world’s first attempt of regulating AI and I think it is rather elegant in tying market access to citizen rights. Proposed in April 2021.

Data Governance Act (download), makes government held data that isn’t available under open data regulations available for use (but not for sharing), introduces the European dataspace (created from multiple sectoral data spaces), mandates EU wide interoperable infrastructure around which data governance and standardisation practices are positioned, and coins the concept of data altruism (meaning you can securely share your personal data or company confidential data for specific temporary use cases). This law aims at making more data available for usage, if not for (public) sharing. Proposed November 2020.

Data Act, currently open for public consultation until 2 September 2021. Will introduce rules around the possibilities the Data Governance Act creates, will set conditions and requirements for B2B cross-border and cross-sectoral data sharing, for B2G data sharing in the context of societal challenges, and will set transparency and accountability requirements for them. To be proposed towards the end of 2021.

Open Data Directive, which sets the conditions and requirements for open government data (which build on the national access to information regulations in the member states, hence the Data Governance Act as well which does not build on national access regimes). The Open Data Directive was proposed in 2018 and decided in 2019, as the new iteration of the preceding Public Sector Information directives. It should have been transposed into national law by 1 July 2021, but not all MS have done so (in fact the Netherlands has just recently started the work). An important element in this Directive is EU High Value Data list, which will make publication of open data through APIs and machine readable bulk download mandatory for all EU member states for the data listed. As mentioned above, last year I was part of the research team that did the impact assessments and proposed the policy options for that list (I led the research for earth observation, environment and meteorology). The implementation act for the EU High Value Data list will be published in September, and I expect it to e.g. add an open data requirement to most of the INSPIRE themes.

Most of the elements in this list are proposed as Acts, meaning they will have power of law across the EU as soon as they are agreed between the European Parliament, the EU council of heads of government and the European Commission and don’t require transposition into national law first. Also of note is that currently ongoing revisions and evaluations of connected EU directives (INSPIRE, ITS etc.) are being shaped along the lines of the Acts mentioned above. This means that more specific data oriented regulations closer to specific policy domains are already being changed in this direction. Similarly policy proposals such as the European Green Deal are very clearly building on the EU digital and data strategies to achieving and monitoring those policy ambitions. All in all it will be a very interesting few years in which this legal framework develops and gets applied, as it is a new fundamental wave of changes after the role the initial PSI Directive and INSPIRE directive had 15 to 20 years ago, with a much wider scope and much more at stake.

Risk Board Game
The geopolitics of digitisation and data. Image ‘Risk Board Game’ by Rob Bertholf, license CC BY

This week our team is staying in a vacation park in the south of the Netherlands. All have their own cabin, except me. Family logistics mean I am spending most time at home, and commute to the holiday park.

This afternoon we discussed our office. What to do with it, how to make it more useful to us.

We opened an office exactly 2 years ago, and more than half of that time we didn’t use it much because of the pandemic. We opened the office because some of us need a place away from home to work. I am used to working either at home, en route, or at a client’s, and have been doing so for 17 years. Having an office, especially a centrally located one as we have in Utrecht, within a building with other facilities available to us (meeting rooms, restaurant/catering services, event spaces, roof terrace), to me is however very useful as a meeting place, and to be able to host groups. During the pandemic some of our team used it to escape the four walls of their limited living spaces in the inner city of Utrecht or Amsterdam. I handed my office keys to a new hire early on in the pandemic.

The central question today was, moving forward, given our pandemic experiences, and the likelihood of at least some measures being in place on and off, how do we want to use our office? And given that use, how do we want it to look / feel?

We split in three groups of three. That in itself was already an important first realisation for me: we can actually split in three groups of three. And the office should work well for all 9 of us, as well as for a handful of frequent collaborators.
In our little groups we discussed our ideal office, and shaped it with the material at hand. One group got to paint the office, another group to build with Lego (serious play is the applicable term I think), and the group I was in used clay.

20210623_160728

20210623_155153

20210623_150342

Patterns in the results were that, while it is still needed to have a few desks, most of them can be removed, that we want to make the office much greener with plants and more colourful in general, that shaping it as a social place is important, as well as a place where things can be created. A few immediate actions (such as removing two thirds of the desks, doing some painting, and adding plants) were decided upon for the summer. Another conclusion was that we simply cannot already know how office use post-pandemic will really be, meaning having plenty flexibility is key. Think furniture, devices, or dividers that can be very easily rearranged at will by those present. Think not investing in a ‘perfect’ design, but doing it as we go along.

Could one redo any useful app, for that matter, that now fills the start-up cemetery?

I was reminded of this as Peter mentioned Dopplr, a useful and beautifully designed service in the years 2007-2010. The Dopplr service died because it was acquired by Nokia and left to rot. Its demise had nothing to do with the use value of the service, but everything with it being a VC funded start-up that exited to a big corporation in an identity crisis which proved unequipped to do something useful with it.

Some years ago I kept track of hundreds of examples of open data re-use in applications, websites and services. These included many that at some point stopped to exist. I had them categorised by the various phases of when they stalled. This because it was not just of interest which examples were brought to market, but also to keep track of the ideas that materialised in the many hackathons, yet never turned into an app or service, Things that stalled during any stage between idea and market. An idea that came up in France but found no traction, might however prove to be the right idea for someone in Lithuania a year later. An app that failed to get to market because it had a one-sided tech oriented team, might have succeeded with another team, meaning the original idea and application still had intrinsic use value.

Similarly Dopplr did not cease to exist because its intrinsic value as a service was lost, but because everything around it was hollowed out. Hollowed out on purpose, as a consequence of its funding model.

I bet many of such now-lost valuable services could lead a healthy live if not tied to the ‘exit-or-bust’ cycle. If they can be big enough in the words of Lee Lefever, if they can be a Zebra, not aiming to become a unicorn.

So, what are the actual impediments to bring a service like Dopplr back. IP? If you would try to replicate it, perhaps yes, or if you use technology that was originally created for the service you’re emulating. But not the ideas, which aren’t protected. In the case of Dopplr it seems there may have been an attempt at resurrection in 2018 (but it looked like a copy, not a redo of the underlying idea).

Of course you would have to rethink such a service-redo for a changed world, with new realities concerning platforms and commonly used hardware. But are there actual barriers preventing you to repeat something or create variations?

Or is it that we silently assume that if a single thing has failed at some point, there’s no point in trying something similar in new circumstances? Or that there can ever only be one of something?

Matisse

Repetitions and Variations, a beautiful Matisse exhibit we saw in 2012 in the Danish national art gallery in Copenhagen. Image by Ton Zijlstra, license CC BY-NC-SA

12 Stages, 1 Painting
12 stages, 1 painting. I’m thinking the reverse, 1 sketch, 12 paintings. Image by Ton Zijlstra, license CC BY-NC-SA

Normandy Cliff w Fish, Times 3
Normandy Cliff with fish, times 3. Matisse ‘Repetitions and Variations’ exhibit. Image by Ton Zijlstra, license CC BY-NC-SA

I’ve been using Zotero for over a year now. It is one of the elements that allowed me to leave Evernote, as it can automagically fetch scientific papers and their metadata for me, store web pages, clip PDFs from my browser etc.

Thanks to Nick Milo and Eleanor Konik discussing Eleanor’s Zotero/Obsidian workflow on YouTube, I found Bryan Jenks’ video on the same topic. Bryan Jenks’ nicely explains something I had seen other people reference.

First he discusses two Zotero plugins that are very useful to me:

  • Zotfile, this allows me to annotate and comment an article, and then extract and store that inside Zotero, with links back to the paper and the location in the paper the annotations and articles belong to.
  • MDnotes, which allows you to export material from Zotero in markdown.

Together they allow me to higlight and annotate an article, and export that as notes into my Obsidian notes. Even better, those notes have the links to the paper and page of an annotation still in them. Clicking them opens up Zotero in the right article, in the right spot. This way context is maintained while I further process my notes, and the actual reference is just a single click away.

This is already very nice and smooth.

Then towards the end he mentions another very useful thing: Dean Jackson‘s Alfred workflow for Zotero, Zothero, which a.o. allows fancy search methods of my Zotero database right from my main screen.

Half an hour very well spent, thanks to Bryan Jenkins.

Writing it down may help in getting out of the loop…

I’m continuing my tinkering with federated bookshelves, for which I made an OPML based way of publishing both lists of books, as well as point to other people’s lists and to RSS feeds of content about books. I now changed my XSL style sheet to parse my OPML files to be able to also parse mentions of RSS feeds.

Meanwhile I read Matt Webb’s posting on using RSS (and OPML) a few more times, and I keep thinking, “yes, but where do you leave the actual data?”
Then I read Stephen Downes’ recent posting on distributing reading material and entire books for courses through RSS, and realised it gave me the same sense of not sounding quite right, like Matt’s posting. That feeling probably means I’m not fully understanding their argument.

RSS is a by design simple XML format as a way to syndicate web content, including videos and podcasts. Content is an important word here, as is syndication: if you have something where new material gets added regularly, an RSS feed is a good way to push it out to those interested.
OPML is another by design simple XML format as a way to share outlines. Outlines are content themselves, and outlines can contain links to other content (including further outlines). One of the common uses of OPML is to share a list of RSS feeds through it, ‘these are the blogs I follow’.

In Matt’s and Stephen’s posts I think there are examples that fail to satisfy either the content part of RSS, or the syndication of new content part. In Matt’s case he talks about feeds of postings about books, like my book category in this site, which is fine, but also in terms of lists of books, which is where I struggle: a list doesn’t necessarily list pieces of content, let alone pieces of web content which RSS seems to require. It more likely is just a list. At the same time he mentions OPML as ‘library’, to use to point to such lists of books. Why would you use OPML for the list of lists, but not for the lists themselves, when those book lists themselves have no content per book, only a number of data attributes which aren’t the content items but only descriptions of items? And when the whole point of OPML outlines is branching lists? When a library isn’t any different from a list, other than maybe in size? Again it is different for actual postings about books, but you can already subscribe to those feeds as existing rivers of content, and point to those feeds (in the same OPML, as I do in my experimental set-up now as well).
In Stephen’s posting he talks about providing the content of educational resources through RSS. He suggests it for the distribution of complete books, and for course material. I do like the idea of providing the material for a course as a ‘blob’. We’re talking about static material here, a book is a finished artefact. Where then is the point in syndication through RSS (other than maybe if the book is a PDF or EPUB or something that might be an enclosure in a RSS feed)? Why not provide the material from its original web source, with its original (semantic) mark-up? Is it in any way likely that such content is going to be read in the same tool the RSS feed is loaded into? And what is the ‘change’ the RSS feed is supposed to convey here, when it’s a one-off distribution and no further change beyond that moment of distribution is expected?

OPML outlines can have additions and deletions, though at a slower pace than e.g. blogs. You could have an RSS feed for additions to an OPML outline (although OPML isn’t web content). But you could also monitor OPML outlines themselves for changes (both additions and deletions) over time. Or reload and use the current version as is, without caring about the specific changes in them.

The plus side of OPML and RSS is that there are many different pieces of code around that can deal with these formats. But most won’t be able to deal as-is with adding data attributes that we need to describe books as data, but aren’t part of the few basic mandatory attributes RSS and OPML are expected to contain. Both RSS and OPML do allow for the extension of attributes, if you follow existing name spaces, such as e.g. schema.org’s for creative works, which seems applicable here (both for collections of books, i.e. a shelf or a library, as well as books themselves). If the use of RSS (and OPML for lists of RSS files) is suggested because there’s an existing eco-system, but we need to change it in a way that ensures the existing ecosystem won’t be able to use it, then where’s the benefit of doing so? To be able to build readers and to build OPML/RSS creators, it is useful to be able to re-use existing bits and pieces of code. But is that really different from creating ones own XML spec? At what point are our adaptations to overcome the purposeful simplicity of OPML and RSS destroying the ease of use we hope to gain from using that simplicity?

Another thing that I keep thinking about is that book lists (shelves, libraries) and book data, basically anything other than web published reviews of books, don’t necessarily get created or live on the web. I can see how I could easily use my website to create OPML and RSS feeds for a variety book lists. But it would require me to have those books and lists as content in my website first, which isn’t a given. Keeping reading lists, and writing reading notes, are part of my personal knowledge management workflow, and it all lives in markdown textfiles on my local harddrive. I have a database of e-books I own, which is in Calibre. I have an old database of book descriptions/data of physical books I owned and did away with in 2012, which is in Delicious Library. None of that lives on the web, or online in any form. If I am going to consistently share bookshelves/lists, then I probably need to create them from where I use that information already. I think Calibre has the ability to work with OPML, and has an API I could use to create lists.
Putting that stuff first into my website in order to generate one some or all of XML/OPML/RSS/JSON from it there, is work and friction I don’t want. If it is possible to automatically put it in my website from my own local notes and databases, that is fine, but then it is just as possible to automatically create all the XML/OPML/RSS/JSON stuff directly from those local notes and databases as well. Even if I would use my website to generate sharable bookshelves, I wouldn’t work with other people’s lists there.

I also think that it is very unlikely that a ‘standard’ emerges. There will always be differences in how people share data about books, because of the different things they care about when it comes to reading and books. Having namespaces like schema.org is useful of course, but I don’t expect everyone will use them. And even if a standard emerges, I am certain there will be many different interpretations thereof in practice. It is key to me that discoverability, of both people sharing book lists and of new to me books, exists regardless. That is why I think, in order to read/consume other people’s lists, other than through the human readable versions in a browser/reader, and to tie them into my information filtering and internal tools/processes, I likely need to have a way to flexibly map someone else’s shared list to what I internally use.

I’m not sure where that leaves me. I think somewhere along these lines:

  • Discovery, of books and people reading them, is my core aim for federation
  • OPML seems useful for lists (of lists)
  • RSS seems useful for content about books
  • Both depend on using specific book related data attributes which will have limited standardisation, even if they follow existing namespaces. It is impossible to depend on or assume standardisation, something more flexible is needed
  • My current OPML lists points to other lists by me and others, and to RSS feeds by me and others
  • I’m willing to generate OPML, RSS and JSON versions of the same lists and content if useful for others, other than templating there’s no key difference after all
  • Probably my website is not the core element in creating or maintaining lists. It is for publishing things about books.
  • I’m interested in other people’s RSS feeds about books, and will share my list of feeds I follow as OPML
  • I need to figure out ways to create OPM/RSS/JSON etc directly from where that information now lives in my workflow and toolset
  • I need to figure out ways to incorporate what others share with me into my workflow and toolset. Whatever is shared through RSS already fits existing information strategies.
  • For a limited number of sources shared with me by others, it might make sense to create mappings of their content to my own content structures, so I can import/integrate them more fully.

Related postings:
Federated Bookshelves (April 2020)
Federated Bookshelves Revisited (April 2021)
Federated Bookshelves Proof of Concept (May 2021)
Booklist OPML Data Structure (May 2021)