What would you like me to write more about?

Design Museum
Something to aspire to

A few years ago Elmine and I wrote a short e-book on how to organize an unconference as a birthday party (PDF linked on the right). Since then I’ve regularly entertained the idea of writing another e-book, but that never really happened. While I do have some topics I’d like to write about, I find my knowledge of those topics still too limited to be able to come up with a narrative to share anything worthwile. There are also doubts (fears?) about what type of things would have a potential readership

So this week I decided to ask:

What would you like to see me write more or more extensively about?

Already I got a range of responses, and it is an intriguing list. Some suggestions are about aspects of my own journey, others are about topics that I don’t know much (or anything) about, but where apparantly there’s interest in my take on it. Some come close to topics I already want to write more about, but feel I haven’t found an angle yet.

Here’s the list until now. More suggestions and thoughts are welcome.

  • Optimal unfamiliarity (a phrase I coined in 2004 initially to describe what mix of people make a great event audience to be part of, but has become a design principle in how I try to collect information and learn.), suggested by Piers Young
  • An epistolary travel log novella (something that could arise from my 14 years of blogging about my travels and work), suggested by Georges Labreche
  • Open currencies (which Google tells me they have no meaningful results for, but which connects to my experience with LETS, and chimes with free currencies in p2p networks), suggested by Pedro Custodio
  • Moderating sessions with a mix of analog and digital tools (closely connected to my thoughts about fruitful information strategies in social contexts), suggested by Oliver Gassner
  • Fatherhood (as I became one 9 weeks ago, but I don’t think 9 weeks counts as experience), suggested by Dries Krens
  • Motivating others to act on open data (a large chunk of my work), suggested by Gerrit Eicker
  • Being a European in the digital age (which I strongly claim to be), suggest by Alipasha Foroughi
  • Convincing profit oriented organisations of the value of open access and responsible research (comes close to Gerrit’s point), suggested by Johnny Søraker
  • How and why I left my job (being employed by Dries mentioned above), suggested by Rob Paterson
  • The journey from my involvement in knowledge management and early blogging, to where I am now, and how it impacted the way Elmine and I arrange our lives (lots to unpack here!), suggested by Jon Husband (who, like Rob Paterson, has been part and witness of that journey over many years)
  • The proliferation of means of communication versus the quality of communication (for me this points to information strategies on focus, filtering etc.), suggested by Jos Eikhout
  • Personal information strategies and processes using open source tools (something I blogged often about in various shapes and forms), suggested by Terry Frazier, a fellow blogger on knowledge management back when I started blogging in 2002

Looking at who responded is already in a way a manifestation of some of the suggested topics (the journey, the information strategies, the optimal unfamiliarity, facilitating communities).

I can’t promise I’ll write about all of the things suggested, but I appreciate the breadth and scope of this list and the feedback I can unpack from it. More suggestions are very welcome.

Archiving Mail in MySQL with MAMP and Mailsteward

As I am moving out of Gmail, I had to find a way to deal with the 21GB mail archive from the past 12 years.

Google lets you export all your data from its various services, including email. After a day or so you get a download link that contains all your mail in one single file in MBOX format.

MBOX is a text format so it allows itself to be searched, but that would only tell you that what you are looking for is somewhere in that 21GB file.

I could also import it into my mail client as a local archive, by dropping the MBOX file in the Local Folder of Thunderbird with Finder. That provides me with a similar access and search capability as I had for all that mail in Gmail. However, if I would like to do more with my archive, mine it for things, and re-use stuff by piping it into other workflows having it in Thunderbird would not be enough.

Mailsteward puts MBOX into MySQL
So I searched for a way to more radically open my archive up to search. I came across DevonThink, but that seemed a bit overkill as it does so much more than merely digesting a mail archive, and as such provides way too much overlap with my Evernote. (Although I may rethink that in the future, if I decide to also move out of Evernote, as after Gmail it is my biggest third party service that contains lots of valuable information.) I looked for something simpler, that just does what I need, putting e-mail into sql, and that is how I found Mailsteward Pro.

There are three versions of Mailsteward, and I needed the Pro version, as it is the one that works with MySQL and thus can handle the volume of mail in my archive. It costs $99 one time, not cheap, but as I was paying for storage with Google as well, over time it pays for itself.

Installing Mailsteward
When installing Mailsteward it assumes you already have a MySQL server running on your system. I use MAMP Pro on my laptop as a local web and mysql server, on which I run different things locally, like a blog based journal and a self-assessment survey tool. MAMP Pro is very easy to install.

You need to take the following steps to allow Mailsteward access to MySQL. In MAMP Pro you need to allow external access to MySQL, but only from within your own system (this basically means applications other than MAMP can access the MySQL server.

Schermafbeelding 2016-07-19 om 16.37.07

Then you create a new database via the PHP Mysqladmin that comes with MAMP. Mailsteward will populate it with the right tables. In my case I aptly named it mailarchives.

Schermafbeelding 2016-07-19 om 10.48.16

Within Mailsteward you then add a connection, listing the database you created, and adding the right ports etc. Note that the socket it requests isn’t an actual file on your system, but does need to point to the right folder within the MAMP installation, which is the Application/MAMP/tmp/mysql folder.

Schermafbeelding 2016-07-19 om 08.41.51

Importing MBOX files
I first tested Mailsteward with my parents e-mail archive that I kept after they passed away last year, to be able to find contact details of their friends. It imported fine. Then I tried to import my Gmail MBOX file. It turns out 21GB is too large to handle in one go for Mailsteward, as it eats away all memory on your Mac. I concluded that I need to split my Gmail MBOX file into multiple smaller ones.

Luckily there is a working script on GitHub that chops MBOX files up in smaller ones, and that allows you to set the filesize you want. I chopped the Gmail MBOX into 21 smaller files of 1GB each. These imported ok into MailSteward. Mailsteward maintains tags and conversation threads.

To run the script, first open it in a text editor and change the filesize limit to what you want (default is 40MB, I changed it to 1GB). Then open Terminal and run the script by typing the following command, where the destination folder does not need to exist:

sudo php mbox_splitter.php yourarchivename.mbox yourdestinationfolder

terminalcommand

That way you end up with a folder that contains all the smaller MBOX files:

Schermafbeelding 2016-07-22 om 16.06.53
Using Mailstewards import feature you then add each of those files, by hand (but luckily you only need to do that once).

Using the archive
Mailsteward allows you to search the archive through its rather simple and bland interface, but you can also tweak the MySQL queries it creates yourself. The additional advantage of having it in MySQL is that I can also access the archive with other tools to search it.

Schermafbeelding_mailsteward

Adding newer mail to the archive
Thunderbird allows me to export e-mail as MBOX files via the Import/Export add-on, which can then be added to the archive by Mailsteward. So that’s a straightforward operation. Likely I can automate it and schedule it to run every month.

How to leave Gmail

Leaving Gmail, a tough question
In the past two years I have been slowly reconfiguring my online routines to increase privacy safeguards, and bring more of my data under my own control, while avoiding making my work routines more difficult and thus less routine. How to create an e-mail workflow that does not rely on Gmail has been the hardest part of this effort. I think I now finally have figured out how to do it without loss of convenience, and hope to have made the switch after I finish exporting all e-mail data Google has from me.

mailinbox
After 12 years this will no longer be a familiar sight for me

Previous steps I took
Some things I already did to increase my control over my own data are:

Not that I don’t use anything but my own stuff now, I also am still a heavy user of various services, like Evernote for instance, or my Android phone. But the usage of third party services has become more varied and spread-out, reducing the impact of losing any one of them.

Why I want to leave Gmail
The net is a distributed place, and our information strategies and routines should embrace that distributedness. In practice however we often end up in various silos and walled gardens, because they are so very convenient to use, although they actually decrease our own control and/or introduce single points of failure. If your Facebook account gets suspended can you still interact with others? If your Google account gets suspended, do you still know how to reach people? Using Gmail also means all of my stuff resides on servers falling under the not very privacy sensitive US laws.

Since July 2004 I have however completely relied on Gmail. It is an easy way to combine the various e-mail addresses I use into 1 single inbox ( or rather multiple inboxes on the basis of follow-up actions), and it has great tagging, search and filtering so that you never need to file anything or sort into folders. I have used Gmail as my central inbox for everything. Since 2004 I have accumulated about 770.000 emails in 249.000 conversations, for a total of 21GB. Gmail is therefore the largest potential single point of failure in my information processing.

The issues to solve
To wean myself off Gmail there were several things for which I needed a similarly smooth working alternative:

  • All the mail addresses I use need to come together into a single mailbox, and conversations need to be threaded
  • Availability across devices, and via webmail. Especially on the road I use my phone for quick e-mail triage, and as alternative for phone calls. Webmail is my general purpose access point on my laptop while traveling
  • Having access to my full mail archive for search and retrieval
  • Excellent tagging and filtering possibilities

The steps I took to leave Gmail
Finding a path away from Gmail took two realisations, one about process and one about technology.

Changing my process
Concerning process I realized that Gmail allows me, or even invites me, to be very lazy in my e-mail processing routines. Because of the limitless storage I merely needed to be able to find things back (through the use of tags for instance), and never needed to really decide what to do with an e-mail.

This means for instance that lots of attachments only live on in my mailbox, without me adding them to relevant project documentation etc. Likely I spent hours in the past years searching for slide decks in my mountain of e-mail, in stead of spending half a minute once to store and archive an attachment in a more logical place where I’m more likely to find it with desktop search, or serendipitously bump into it, and then throw the mail message out. So mail processing has to become a much less lazy process with a few more active decisions in handling messages. E.g. attachments into a project folder, contact info into contacts, book keeping related messages to bookkeeping (and no longer going through all mail tagged bookkeeping every quarter to do my taxes), tasks and actions to my Things todo application. I already wrote several Apple Scripts to let my todo app and Evernote talk to various other software packages (like Tinderbox), but it is now likely I will write a few more to automate mail message processing further (because I prefer to still keep my process as lazy as possible).

Changing my tools
A second key realization was that my original reasons for staying within webmail had meanwhile been solved with better technology: it used to be that only Gmail provided the cross-device access to all my mail accounts simultaneously, something I could not easily do in 2004 with a desk/laptop mail client in combination with a mobile mail client. Now, with much broader IMAP support (not just by my software tools, but also by hosting companies) this is much easier, increasing the range of possible alternatives. Threading mail conversations is now also a more universal feature.

This now allowed me to start using Thunderbird mail client, including PGP encryption, on my laptop (I never intensively used a mail client before on my laptop), in combination with the open source K9 Android mail app (replacing the Gmail app for me), also with encryption options. Both allow tagging of messages, and Thunderbird allows filtering for not just incoming mail but also when sending and when archiving, which is really useful.

As an alternative to piping all my mail accounts into Gmail, I now use all the real inboxes of those mail accounts where they’re originally hosted, and use IMAP to combine into one user interface on my laptop and mobile. Those separate mailboxes do have lower storage limits (usually 500MB), so it is more likely I bump into limits, and that is the reason I need a much less lazy mail processing routine (especially concerning larger attachments), in which I can regularly archive older mail.

Separately I also now use a different webmail provider, Protonmail in Switzerland, that comes with default encryption. I’ve attached a domain name to it (zylstra.eu).

The archiving issue
The above shows how leaving Gmail moving forward from the here and now, by solving the one-inbox and the multiple device issues can be done by changing process and tools. That leaves the question of how to deal with the 21GB of mail archive from the past 12 years. Leaving it all in Gmail, and use that as archive might be a work-around for old mail, but doesn’t help me for future mail. I could add it as a local folder to the Thunderbird mail client, but that thought did not appeal to me and feels clunky. I find that I never use my mail archive from my mobile, so the archive does not need to be cloud based per se. So, I opted to keep my mail archive local, by storing it in a mysql database. This allows for query based searches, and even text mining, without it clogging up my mail client itself. Gmail can export your archive in a single MBOX file, and I used Mailsteward Pro to transform it into a mysql database. (More on that set-up in the next posting Archiving mail in mysql with MAMP and Mailsteward). With the archive now locally stored, the database is backed up to both my NAS drive and my VPS.

What remains
With the basic set-up for leaving Gmail now in place, there is still work te be done over the coming months. Clearing out the archive at Gmail is one step, once I feel comfortable with searching my new mysql archive. Creating more filters in my mail client, and writing a few scripts to integrate my mail processing with the other tools I use is another. There are also likely a whole bunch of things (accounts, subscriptions etc) that use my gmail address, which I will change as I go along.

My longtime blogging friend Roland Tanglao suggested to mine my mail archive for things that could be published, contact data, harvest old ideas that can feed into my work now etc. This sounds appealing but needs some contemplation and then a plan. Having the archive in mysql makes it a lot easier to come up with a plan though.

Beyond mail, there are of course more Google services I use heavily, especially Calendar, which are tied to my gmail address. I could move that to my Owncloud as well. I will keep my Google account, as this isn’t about ditching Google but about reducing risks and taking more control. Apart from Calendar there are no other single points of failure in the way I use my Google account. Beyond Google, Evernote is another silo I’m heavily invested in, and the content I keep there is arguably more valuable to me than my Gmail. So that is a future change to think about and seek alternatives for.

Inbox 0 is for Losers
I reached Inbox -1 on Gmail once in 2009 🙂

[Find the outline and slides of my Koppelting session on leaving Gmail in the follow-up posting at https://tzyl.eu/leavegarden. You can use the shortlink https://tzyl.eu/gmail to refer to this posting.

Near Future SF Reading List: Explore Emerging Future Together

Gogbot 2015: Google's AI DreamsThe dreams of Google’s artificial intelligence

I read lots of science fiction, because it allows exploring the impact of science and technology on our society, and the impact of our societies on technology development in ways and forms that philosophy of technology usually doesn’t. Or rather SF (when the SF is not just the backdrop for some other story) is a more entertaining and accessible form of hermeneutic exercise, that weaves rich tapestries that include emotions, psychology and social complexity. Reading SF wasn’t always more than entertainment like that for me, but at some point I caught up with SF, or it caught up with me, when SF started to be about technologies I have some working knowledge of.

Bryan Alexander, a long time online peer and friend for well over a decade, likewise sees SF, especially near future SF, as a good way to explore emerging future that already seem almost possible. He writes “In a recent talk at the New Media Consortium’s 2016 conference, I recommended that education and technology professionals pay strong attention to science fiction, and folks got excited, wanting recommendations. So I’ve assembled some (below)“. His list contains a group sourced overview of recent near future SF books, with some 25 titles.

I know and read half of the books on the list, and last night loaded up my e-reader with the other half.

If you want to discuss those books keep an eye on Bryan’s blog, as you’re sure to get some good conversations around these books there.

Gogbot 2015: Google's AI Dreams Gogbot 2015: Google's AI Dreams
The dreams of Google’s artificial intelligence

(photos made during the 2015 Gogbot Festival, the yearly mash up of art, music and technology into a cyberpunk festival in my home town Enschede.)

Related: Enjoying Indie SF, March 2016

Original social media needs still unmet

My friend Peter Rukavina blogged how he will no longer push his blogpostings to Facebook and Twitter. The key reason is that he no longer wants to feed the commercial data-addicts that they are, and really wants to be in control of his own online representation: his website is where we can find him in the various facets he likes to share with us.

Climbing the Wall
Attempting to scale the walls of the gardens like FB that we lock ourselves into

This is something I often think about, without coming to a real conclusion or course of action. Yes, I share Peters sentiments concerning Facebook and Twitter, and how everything we do there just feeds their marketing engines. And yes, in the past two years I purposefully have taken various steps to increase my own control over my data, as well as build new and stronger privacy safeguards. Yet, my FB usage has not yet been impacted by that, in fact, I know I use it more intensively than a few years ago.

Peter uses his blog different from me, in that he posts much more about all the various facets of himself in the same spot. In fact that is what makes his blog so worthwile to follow, the mixture of technology how-to’s, and philosphical musings very much integrated with the daily routines of getting coffee, or helping out a local retailer, or buying a window ventilator. It makes the technology applicable, and turns his daily routines into a testing ground for them. I love that, and the authentic and real impact that creates where he lives. I find that with my blog I’ve always more or less only published things of profession related interests, which because I don’t talk about clients or my own personal life per se, always remain abstract thinking-out-loud pieces, that likely provide little direct applicability. I use Twitter to broadcast what I write. In contrast I use FB to also post the smaller things, more personal things etc. If you follow me on Facebook you get a more complete picture of my everyday activities, and random samplings of what I read, like and care about beyond my work.

To me FB, while certainly exploiting my data, is a ‘safer’ space for that (or at least succeeds in pretending to be), to the extent it allows me to limit the visibility of my postings. The ability to determine who can see my FB postings (friends, friends of friends, public) is something I intensively use (although I don’t have my FB contacts grouped into different layers, as I could do). Now I could post tumblerlike on my own blog, but would not be able to limit visibility of that material (other than by the virtue of no-one bothering to visit my site). That my own blog content is often abstract is partly because it is all publicly available. To share other things I do, I would want to be able to determine its initial social distribution.

That is I think the thing I like to solve: can I shape my publications / sharings in much the same way I shape my feedreading habits: in circles of increasing social distance. This is the original need I have for social media, and which I have had for a very long time, basically since when social media were still just blogs and wikis. Already in 2006 (building on postings about my information strategies in 2005) I did a session on putting the social in social media front and center, together with Boris Mann at Brussels Barcamp on this topic, where I listed the following needs, all centered around the need to let social distance and quality of relationships play a role in publishing and sharing material:

  • tools that put people at the center (make social software even more social)
  • tools that let me do social network analysis and navigate based on that (as I already called for at GOR 2006)
  • tools that use the principles of community building as principles of tool design (an idea I had writing my contribution to BlogTalk Reloaded)
  • tools that look at relationships in terms of social distance (far, close, layers in between) and not in terms of communication channels (broadcasting, 1 to 1, and many to many)
  • tools that allow me to shield or disclose information based on the depth of a relationship, relative to the current content
  • tools that let me flow easily from one to another, because the tools are the channels of communication. Human relationships don’t stick to channels, they flow through multiple ones simultaneously and they change channels over time.

All of these are as yet unsolved in a distributed way, with the only option currently being getting myself locked into some walled garden and running up the cost of moving outside those walls with every single thing I post there. Despite the promise of the distributed net, we still end up in centralized silo’s, until the day that our social needs are finally met in distributed ways in our social media tools.

On the need for distributedness and self-reliance

I came across this Guardian article describing how an American author and artist found his Google account deleted, including his 14 year old blog hosted with Google’s Blogger platform.

Screenshot of removed blog message

To me this incident is notable in a few ways.

  • The author concerned had his blog up for 14 years, and even used it to write and keep manuscripts, so clearly it was of key importance to him as an online asset.
  • For such a key asset, using a free service is a risk, as that doesn’t provide any certainty concerning uptime.
  • Blogger, as a free service, comes with a TOS, allowing Google to withdraw service at any moment. You don’t have a ‘right’ to this service.
  • After the account was closed, it was impossible to actually contact Google to ask about the why and how, or if it can be reinstated
  • The author concerned feels he’s being censored (which in a literal sense is impossible, as only governments can censor), although it is likely the account was closed because of a breach of the terms of service (which are notoriously unevenly enforced in every platform)
  • The author didn’t keep back-ups.

All of this once again highlights the importance of embracing the distributedness of the internet. You have to make sure that you are not just a passive and consuming part of it, but that for things that are important to you, you are also willing to make sure those things are under as much of your own control as possible. Your blog is only yours if you have control over the infrastructure it runs on. The same is true for e-mail, which in the case mentioned above was also lost: you have to make sure you have full control over at least one domain name, at which you can also receive and send e-mail (you@yourdomain.tld).

This in short means you need to make sure you have a claim to the service you actually need. Blogger offers free hosting but can take it away. If you want your blog to exist, make sure you pay for hosting, and make sure you run it on a domain you control. I used Blogger when I started blogging in November 2002 (around the same time in short, as the artist’s blog that was deleted), but once I realized I was likely to continue writing, after a few months, I moved it to a paid hosting package I could more fully control, and on a URL I acquired separately from the hosting, also under my full control. It doesn’t mean nothing can happen (my blog was hacked once), but it does mean I can recover from it.

The web was built in distributed fashion. If you use it in a centralized way, by making use of large centralized services, you expose yourself to vulnerabilities. That is true for centralized free blogging platforms, like Blogger.com or WordPress.com, and all those other services such as Facebook, Flickr and whatnot. Don’t make yourself dependant, don’t put yourself in a position that has a single point of failure.

Arsonists Walk Among Us

Playing politically on base emotions has consequences. Choice of words has consequences. It does not make the fear mongers and populists directly or criminally responsible, but it does come with moral responsibilities. If you consistently fan emotional flames you do bear moral responsibility for the resulting sparks and ‘singular unconnected’ fires. What British radio host James O’Brien says in the fragment embedded above about the UK, is as much true in Germany, France, Netherlands, Belgium, Hungary, Poland, Austria etc. I share his deep frustration.

The arsonists walk among us pretending to bring common sense and empathy, because “one should be allowed to say this after all, and high-time too”. They don’t go by the names of Schmitz or Eisenring, but it doesn’t take Max Frisch to point them out. The arsonists walk among us pretending it is some mythical Other that will take “what is Ours” and who will burn our house and institutions down. The arsonists walk among us, luring us with reactionary nostalgia for a country and a time that has never existed. It will be those arsonists however that end up setting things alight, not any ‘Other’.

The question is how much of a Herr Biedermann I will be, you will be, we will be, before we learn to send the arsonists packing.

Do we even know anymore how to do that?

The Burning of the houses of Parliament, October 16, 1834 by Turner
The Burning of the Houses of Parliament, Oct 16 1834, by J M W Turner. Image by Pete Jelliffe, CC-BY-SA

Data Sovereignty as Prerequisite for Open Data Agency

As we are living in a networked world, increasingly government bodies execute their tasks while collaborating in networks of various other stakeholders. This also happens when it comes to collecting, providing or working with data as part of public tasks. One of the potential detrimental side effects is that it quickly becomes unclear who can decide to open such data up. Or whether a government entity, who wants to publish data as part of a policy intervention, still feels able to do so. This ability to decide over your own data, I call data sovereignty. I think without proper attention, the data sovereignty of public institutions is under pressure in collaborative situations and a threat to the freedom of public entities to decide and act on their own open data efforts. This is especially problematic where the lack of data sovereignty hinders public entities in deploying open data as a policy instrument.

I have just completed an inventory of the data sets that a Dutch province holds and the visible erosion of data sovereignty was the main unexpected outcome for me.
This erosion takes different shapes. Here are a few examples of it, encountered in the Province I mentioned:

  • Data collection on businesses locations and the number of people they employ (to track employment per municipality per sector) is being pooled by all provinces (as a national level data set is more useful). The pooling takes place in a separate legal entity. It is unclear if this entity still falls under FOIA and re-use regulations. This entity also exploits the data by selling it. Logical at the organisational level perhaps, but illogical in comparison with the provincial public task (and maybe not even legal under the Re-Use law). Opening up the data needs to be done through that new entity, meaning not just convincing yourself, but all other provinces as well as the entity who has commercial interest in not being convinced. The slowest will thus set the speed.
  • Data collection on traffic flows, collected by the Province, is stored directly in a national data warehouse (NDW). Again pooling data makes it more useful, but the Province cannot store cleaned data there (anomalies filtered out, pattern changes explained etc.), so always needs to redo that cleaning and filtering whenever they want to work or access their own data. Although the publicly owned NDW now publishes open data, until recently they saw themselves as a commercial outfit, adverse to the notion of open data.
  • Data collection on bicycle traffic, done by the Province, is stored in the online database of a French service provider active in the entire EU. Ownership of the data is unclear. The Province only accesses the data through the French website. If a FOIA request came, it would be unclear if providing the data runs counter to any rights the service provider is claiming.
  • Data collection on the prevalence of bird species is being collected in collaboration with nature preservation groups and large numbers of volunteers. The Province pays for the data collection, but the nature preservation groups claim their volunteers (by virtue of their voluntary efforts) are the rightful owners of the data. Without seeking internal legal advice, the discussion remains unsolved and stalls.

None of these situations are unsolvable, all of them can get a definitive answer. The issue however is that nobody is clearly in a position, or has the explicit role to make sure such an definitive answer gets formulated. Because of that, uncertainties remain, which easily leads to inaction. If and when the Province wants to act to open data up, it therefore easily runs into all kinds of questions that will slow action down, or ensure action does not get taken.

It is entirely logical that public entities are collaborating in networks with other public entities and domain-specific stakeholders for the collection, dissemination and use of data. It is also certain, given our networked society and the drive for efficiency, the number of situations where such collaboration takes place will only rise. However, for the drive towards more openness it is detrimental when ownership of public data becomes unclear, gets transferred to an entity that potentially falls outside the scope of FOIA, or falls under the rights of a private entity, just because nobody sought to clarify such matters at the outset.

Public entities should learn to strongly guard their data sovereignty if they want to maintain their own agency in using opening up data as a policy instrument. Moving to open by design as a default for the public sector, requires stopping the erosion of data sovereignty.

On Open Data and the Panama Papers

Two questions I was asked

In the past days people asked me questions about the Panama Papers and how it is connected to open data. Is a leak like the Panama Papers helpful or not to the cause of open data? Is it reasonable that the journalists don’t plan to publish all leaked files?
Before answering those questions, I will explore aspects of the data we’re talking about, the content of the leak, and the legality and morality of it all.

The core concept at stake: beneficial ownership

First let’s look at the data that we are interested in here, and why that data needs to be fully transparant.
There are two elements of importance. One is that in a transaction you need to be able to verify that you are dealing with the right person: can your counterpart deliver, and is your counterpart legally able to enter into a transaction? If you buy my house you need to verify it is mine to sell, and therefore cadastral ownership information is a public register. Similarly if you deal with my company, you need to verify who is allowed to enter into a contract on behalf of that company, and who ultimately owns it. That last bit is called beneficial ownership: cui bono? This information is registered in public company registers.

This means that for my company (The Green Land), you can find out through the Dutch company register (searching by name, or by the company number we provide on our website and letter head) that there are 4 owners with power of signature. Those four are all other companies. One of those owning companies is Interdependent Holding, and if you check that one, you’ll find out that I’m the sole owner (the other three are owned by my partners). This way you can trace that I am the ultimate owner of part of The Green Land. This is relevant information if you do business with my company, and it is important information for the tax office, who want to know when to tax me for what. These sort of checks, which should be possible, means beneficial ownership should be completely transparant. You are thus able to find out personal information about me through the company register.

That is the trade-off I make as an entrepreneur with you and the rest of society. You all allow me, by creating a company, to shield myself personally from several risks: a bankruptcy of my company will mostly not touch me personally (unless it is due to my negligence or misconduct). The overall benefit to society is that more people will feel opportunity to start something new that way. In exchange I need to give up some of my anonymity, so that it is always clear who ultimately owns something.

Shell companies break that trade-off when beneficial ownership is purposefully obscured, especially when it is the primary reason that company was created in the first place. It allows me to make myself invisible to you in a deal, and it allows me to evade taxes without much chance of that becoming easy to spot.

What is in the Panama Papers?

The Panama Papers are a collection of over 11 million of documents and some structured data, about the creation of a wide range of shell companies (210.000!) in the past 40 years. All the documents come from one law firm in Panama, that has assisted in creating companies in jurisdictions where beneficial ownership is not fully recorded. Those jurisdictions don’t record that information because they don’t need it for taxation. The leaked documents contain the correspondence and other material, such as copies of passports, that was used to keep client records at the law firm, and to establish firms. So the leak is not a list of companies and beneficial ownership like you could get from e.g. the Dutch company register. But from the leaked documents that information about beneficial ownership can be derived. Even if it is ultimately not recorded in the company registers of the jurisdictions these companies are established in (such as the British Virgin Islands). And that is what some 400 journalists in 80 countries did this past year: derive the beneficial ownership information from the leaked documents. And then write stories about it. The law firm in Panama involved is just one of many law firms offering these services. It is not the biggest either, though it is in the top 5, it just is a law firm that seems to have had crappy data security.

On legality and morality

It is perfectly legal to create a company in the British Virgin Islands or any other similar jurisdiction, and using a Panama law firm to help you do that. It is also normal in those jurisdictions that beneficial ownership is not always recorded, simply as it is not needed by the local tax office and they have therefore no reason to collect it.
It is however illegal to not disclose such ownership when asked to do so by the tax office in your country of residence.
When beneficial ownership is hidden, and when the law firm you asked to help you do that is also somewhere hard to get at, it becomes easy to not disclose such ownership to your local tax authority and not be found out though.

This leak now ends that purposefully created obscurity for a large amount of companies and the people who have the beneficial ownership of those companies.
Some of that will until now have been undisclosed to tax authorities elsewhere, and thus illegal. This is what is now prompting government investigations in Australia, Peru, Netherlands etc.

Next to legality, there are also issues of morality at play. And this mostly is where the journalistic interest is.
The morality of having a company in a jurisdiction where you do not do any business, but where you happen to be able to obscure your beneficial ownership can be called into question.
Why would a board member of a Chilean transparency organisation need one? Why would a prime minister who demands austerity from all citizens have one? Citizens that cannot use the options the prime minister apparantly does have access to (or draw immediate attention if they do, such as the shop owners of a Welsh village)
Why would an advisory board member of a Dutch bank, that is currently government owned after a bail-out have one? Why would an NGO have one and send public money and donations to them? Why would family and friends of heads of state need them, coinciding with their rise to power? Why would such obscurity be important to art dealers?

The people it concerns apparantly feel those morality issues themselves as well when confronted, though some have above board explanations, such as the NGO mentioned. Why else would a prime minister walk out of an interview about it and later resign? Why else would another prime minister provide 4 different explanations in 4 different days before coming clean?
Why else would a banker give up his position when challenged? Why else would a country whose powerful figures are named have reporting on it censored and firewalled. Why else would a government denounce it as smear campaign even before the Panama Papers were first published?
Why else would a leading transparency activist immediately resign? A FIFA official resign from his organisation’s Ethics Panel? And a minister from another austerity-focussed government? Why else try damage control, while pleading innocence without denying the facts presented, but because of being caught with your hands in the cookie jar?

The full list will come out

One of the defenses out there for those finding themselves cornered in this moral quandary is that only they are targeted. Trying to raise suspicion about the fact not all companies and their owners are yet published. That is attacking the messenger when you don’t really have anything to counter the message. Of course the journalists involved lead with the juiciest stories, so heads of state and prime ministers find themselves first exposed.
The list of shell companies and their owners is of course much longer, 210.000 companies long, and as I said beneficial ownership is the key concept here. So it is needed that that list will come out in full. It will. The ICIJ has announced that the full list will be published early May (last sentence at that link), after the stories prepared in the past year have been published. So we will soon be able to see for ourselves who else we are dealing with.

Getting back to the questions

So should the entire leak be made public? All those 11 million documents or so? I don’t think so. The beneficial ownership information definitely should be. This is the stuff you can get from most company registers around the world. Beneficial ownership being public is part of the deal a company owner makes with society. That is however only the information derived from the leak and not the content of most of the documents leaked. Copies of passports used to register the company are not ours to see. You don’t post yours either, nor is it public through company registers normally. So no, the entire leak does not need to be public I think.
Once you will see the full list of 210.000 shell companies and their owners, you will realize how many people not of public interest are on it. And you’ll realize that disclosing material other than beneficial ownership is a breach of privacy that doesn’t add anything to further challenge the legality or morality of the situation.

Does this help open data? That is uncertain to me. Maybe it helps to put beneficial ownership information at the heart of current discussions of opening up company registers in Europe further. Many of these European registers are public (you can check the records, for a fee), but not open. Only Denmark has a fully open company register. In other words: for you as an individual, Panama isn’t very far away in a certain sense, Panama is right in your own capital. Maybe it helps European governments to understand that they should lead by example in opening up beneficial ownership to the public pro-actively, and that their tax-offices have something to gain by it (because then data across multiple European jurisdictions will be routinely available).
Maybe it shows owners of shell companies that geographical distance and obscurity are much less of a protection than before digitisation, and lead at least some to make different choices. Or maybe it shows that full global transparency of company registers is unavoidable over time: if not voluntarily then forced.

On Agency Pt 1.: Embracing Distributedness to Increase Agency

As individuals, as groups, as organisations, as societies, we are not leveraging the full power of distributed networks by far. In distributed approaches to many of our current issues is where we will find much needed agency for ourselves.

Paul Baran distributed networks

The internet is meant to be distributed
The internet was conceived as a distributed network (item c in the pic above). Physically, in terms of cables, this is true. Practically in terms of functionality, and socially, in terms of people, it mostly is not.

….but that is not what we get provided with
Especially for individuals, as the end points of whatever gets provided or done over the internet, the experience is much more like a centralized network (a in the pic), or at best a decentralized one (b in the pic) when I use various services next to each other. Whereas tremendous affordance and power lies in being myself a real node of agency in a distributed network. Facebook is to users a centralized hub, that we interact with, a walled garden. Google, Apple, Amazon all behave in similar ways. Most Internet of Things devices and other connected products require you to stay within one silo. My Sonos speakers can’t play nicely with other wireless solutions. A ‘smart’ thermostat or a smart meter only communicate with their own components, or with the energy company instead of me. Most of it is not open and require us to pick one specific ecosystem to be part of, and voluntarily lock us in. Because for these products and services the easiest path to viable business is in aggregation, providing a measure of control to the business as they scale.

Truly distributed solutions are of course available online. Diaspora does what Facebook does, but your data is under your control. The blockchain behind BitCoin distributes transparency and accountability, and combines it with anonymity, even though current implementations end up almost centralized as well. Maker machines leverage the distributed network: the machines allow me to use and produce locally what I find through the networks. With Arduino and Raspberry Pi we have open computing power at our hands that allow us to do much of what centralised and silo’d IoT systems do.

We need to make distributedness way easier
The threshold to use truly distributed technology yourself remains extremely high however.
If something is difficult, even if it’s ultimately better, I won’t do it if there are easy alternatives. I will not go through the trouble of administering a UNIX server to run my own Diaspora pod if I can join Facebook within a single minute. I will not repair my own fridge door handle if getting my 3d printer to work and drawing a thing takes ages, but I can order a new fridge in 2 minutes and have it delivered tomorrow.

…also when it comes to human interaction
Similar patterns I see on the social side, when it comes to learning, organizing, collaborating, creating. Recently I was in another meeting where the participants only knew one mode: sit down to talk with an agenda and discuss things to death, even though what was needed was to just get to work together and do something. But they did not know any of the work forms, methods or instruments you could use for that. Again there is a myriad of proven work methods when it comes to organizing, collaborative structures, decision making, ideation and designing. Methods that build on the diversity and connectedness of groups, on distributedness, on peer to peer.

Reducing adoption thresholds of truly distributed technology and peer to peer methods allows an increase in agency. What agency looks like in a connected and distributed world (On Agency pt. 2), and what reducing thresholds to adoption (On Agency pt. 3) looks like in my eyes is for following postings.