This week it was 15 years ago that I became involved in open government data. In this post I look back on how my open data work evolved, and if it brought any lasting results.

I was at a BarCamp in Graz on political communication the last days of May 2008 and ended up in a conversation with Keith Andrews in a session about his wish for more government held data to use for his data visualisation research. I continued that conversation a week later with others at NL GovCamp on 7 June 2008 in Amsterdam, an event that I helped organise with James Burke and Peter Robinnet. There, on the rotting carpets of the derelict office building that had been the Volkskrant offices until 2007, several of us discussed how to bring about open data in the Netherlands:

My major take-away … was that a small group found itself around the task of making inventory of what datasets are actually held within Dutch government agencies. … I think this is an important thing to do, and am curious how it will develop and what I can contribute.
Me, 10 June 2008

Fifteen years on, what came of that ‘important thing to do’ and seeing ‘what I can contribute’?

At first it was mostly talk, ‘wouldn’t it be nice if ..’, but importantly part of that talk was with the Ministry responsible for government transparency who were present at NL GovCamp. Initially we weren’t allowed to meet at the Ministry itself, inviting ‘hackers’ in was seen as too sensitive, and over the course of 6 months several conversations with civil servants took place in a pub in Utrecht, before being formally invited to come talk. That however did result in a first assignment from January 2009, which I did with James and with Alper (who also had participated in NL GovCamp).

With some tangible results in hand from that project, I hosted a conversation at Reboot 11 in 2009 in Copenhagen about open data, leading to an extension of my European network on the topic. There I also encountered the Danish IT/open government team. Cathrine of that team invited me to host a panel at an event early 2010 where also the responsible official at the European Commission for open data was presenting. He invited me to Luxembourg to meet the PSI Group of national representatives in June 2010, and it landed me an invitation as a guest blogger that same month for an open data event hosted by the Spanish government and the ePSIplatform team, a European website on re-using government information.

There I also met Marc, a Dutch lawyer in open government. Having met various European data portal teams in Madrid, I then did some research for the Dutch government on the governance and costs of a Dutch open data portal in the summer of 2010, through which I met Paul who took on a role in further shaping the Dutch portal. Stimulated by the Commission with Marc I submitted a proposal to run the ePSIplatform, a public tender we won. The launching workshop of our work on the ePSIplatform in January 2011 in Berlin is where I met Frank. In the fall of 2011 I attended the Warsaw open government data camp, where Marc, Frank, Paul and I all had roles. I also met Oleg from the World Bank there. In November 2011 Frank, Paul, Marc and I founded The Green Land, and I have worked on over 40 open data projects since then under that label. Early 2012 I was invited to the World Bank in the US to provide some training, and later that year worked in Moldova for them. From 2014 I worked in Kazachstan, Kyrgyzstan, Serbia and Malaysia for the World Bank until 2019, before the pandemic ended it for now.

What stands out to me in this history of a decade and a half is:

  • How crucial chance encounters were/are and how those occurred around small tangible things to do. From those encounters the bigger things grew. Those chance encounters could happen because I helped organise small events, went to events by others, and even if they were nominally about something else, had conversations there about open data with likeminded people. Being in it for real, spending effort to strengthen the community of practitioners around this topic created track record quickly. This is something I recently mentioned when speaking about my work to students as well: making time for side interests is important, I’ve come to trust it as a source of new activities.
  • The small practical steps I took, a first exploratory project, creating a small collection of open data examples out of my own interest, writing the first version of an open data handbook with four others during a weekend in Berlin served as material for those conversations and were the scaffolding for bigger things.
  • I was at the right time, not too early, not late. There already was a certain general conversation on open data going on. In 2003 the EC had legislated for government data re-use, which had entered into force in May 2008, just 3 weeks before I picked the topic up. Thus, there was an implemented legal basis for open data in place in the EU, which however hadn’t been used by anyone as new instrument yet. By late 2008 Barack Obama was elected to the US presidency on a platform that included government transparency, which on the day after his inauguration in January 2009 resulted in a Memorandum to kick-start open government plans across the public sector. This meant there was global attention to the topic. So the circumstances were right, there was general momentum, just not very many people yet trying to do something practical.
  • Open data took several years to really materialise as professional activity for me. During those years most time was spent on explaining the topic, weaving the network of people involved across Europe and beyond. I have so many open data slide decks from 2009 and 2010 in my archive. In 2008, 2009 and 2010, I was active in the field but my main professional activities were still elsewhere. In 2009 after my first open data project I wondered out loud if this was a topic I could and wanted to continue in professionally. From early 2011 most of my income came from open data, while the need for building out the network of people involved was still strong. Later, from 2014 or so open data became more local, more regular, shifted to being part of data governance, and now data ethics. The pan-European network evaporated. Nevertheless helping improve European open data legislation has been a crucial element until now, to keep providing a fundament beneath the work.

From those 15 years, what stands out as meaningful results? What did it bring?
This is a hard and easy question at the same time. Hard because ‘meaningful’ can have many definitions. If we take achieving permanent or even institutionalised results as yard stick, two things stand-out. One at the beginning and one at the end of the 15 years.

  • My 2010 report for the Ministry for the Interior on the governance and financing of a national open data portal and facilitating a public consultation on what it would need to do, helped launch the Dutch open government data portal data.overheid.nl in 2011. A dozen years on, it is a key building block of the Dutch government’s public data infrastructure, and on the verge of taking on a bigger role with the implementation of the European data strategy.
  • At the other end of the timeline is the publication of the EU Implementing Regulation on High Value Data last December, for which I did preparatory research (PDF report), and which compels the entire public sector in Europe to publish a growing list of datasets through APIs for free re-use. Things I wrote about earth observation, environmental and meteorological data are in the law’s Annexes which every public body must comply with by next spring. What’s in that law about geographic data, company data and meteorological data ends more than three decades worth of discussion and court proceedings w.r.t. access to such data.

Talking about meaningful results is also an easy question, especially when not looking for institutional change:

  • Practically, it means my and my now 10 colleagues have an income, which is meaningful within the scope of our personal everyday lives. The director of a company I worked at 25 years ago once said to me when I remarked on the low profits of the company that year ‘well, over 40 families had an income meanwhile, so that’s something.’ I never forgot it. That’s certainly something.
  • There’s the NGO Open State Foundation that directly emerged from the event James, Peter and I organised in 2008. The next event in 2009 was named ‘Hack the Government’ and organised by James and several others who had attended in 2008. It was registered as a non-profit and from 2011 became the Open State Foundation, now a team of eight people still doing impactful work on making Dutch government more transparant. I’ve been the chair of their board for the last 5 years, which is a privilege.
  • Yet the most meaningful results concern people, changes they’ve made, and the shift in attitude they bring to public sector organisations. When you see a light go on in the eyes of someone during a presentation or conversation. Mostly you never learn what happens next. Sometimes you do. Handing out a few free beers (‘Data Drinks’) in Copenhagen making someone say ‘you’re doing more for Danish open data in a month by bringing everyone together than we did in the past years’. An Eastern European national expert seconded to the EC on open data telling me he ultimately came to this job because as a student he heard me speak once at his university and decided he wanted to be involved in the topic. An Irish civil servant who asked me in 2012 about examples I presented of collaboratively making public services with citizens, and at the end of 2019 messaged me it had led to the crowd sourced mapping of Lesotho in Open Street Map over five years to assist the Lesotho Land Registry and Planning Authority in getting good quality maps (embed of paywalled paper on LinkedIn). Someone picking up the phone in support, because I similarly picked up the phone 9 years earlier. None of that is directly a result of my work, it is fully the result of the work of those people themselves. Nothing is ever just one person, it’s always a network. One’s influence is in sustaining and sharing with that network. I happened to be there at some point, in a conversation, in a chance encounter, from which someone took some inspiration. Just as I took some inspiration from a chance encounter in 2008 myself. To me it’s the very best kind of impact when it comes to achieving change.

I’ve plotted the things mentioned above in this image for the most part. As part of trying to map the evolution of my work, inspired by another type of chance encounter with a mind map on the wall of museum.


The evolution of my open data (net)work. Click for larger version.

Bookmarked a message on Mastodon by David Speier

David Speier is a freelance journalist who researches the German far right. In this thread on Mastodon he describes the work they’ve done to check statements from interviews with a former far right member, and to connect them to other source material (photos from events, other people, reports etc.). Of interest to me here is that they used Obsidian to map out people, groups, places, events and occurrences, to verify, to see overlaps and spot blind spots. Nice example of taking something that is inherently text and image based and use Obsidian to ferret out the connections and patterns. There are some topics that currently pop-up in my work in very different projects, and more purposefully teasing out the connections like in this example seems a useful notion.

In einer #Obsidian-Datenbank haben wir Kontaktpersonen, Gruppen, Orte und Ereignisse zusammengeführt. Mehr als 70 umfangreiche Belegdokumente untermauern die einzelnen Aussagen von „Michael“

David Speier

During our visit to the Neues Museum in Neuremberg last week, this mind map stood out to me. Art collector, dealer and curator René Block (1942) made it as a sort of work autobiography for the period 1964-2014.

It stood out to me because it shows the evolution of his work, the connections between major phases and individual projects.

I have a list of a few ‘big themes’ I’ve been interested in, and have worked on. (in that order, as most often my work came out of a side interest during a previous phase, also when I was employed), and over time I’ve recognised the overall topic that carries them all, a fascination with the affordances of digital technology for our agency and how it impacts how we live, learn, work and organise.

In any given moment I can think that most of my activities are a coincidence, that I happened across them on my generally undirected path, but my blog archive has often shown me that I already mentioned topics and ideas much earlier.
There’s an evolution to them, and since I’ve spotted the ‘carrier theme’ I trust that evolution.

I’m sure I can make a mind map like the one above with the different client projects, activities and key events of the past 26 years. Maybe everyone should make such a map for themselves at times, if only to spot the adjacent paths within one’s reach in the evolutionary plane of possibilities. It seems Block made this at the end of his working life when he was 72. What might it have told him if he had drawn or redrawn it at earlier times?

John Caswell writes about the role of conversation, saying "conversation is an art form we’re mostly pretty rubbish at". New tools that employ LLM’s, such as GPT-3 can only be used by those learning to prompt them effectively. Essentially we’re learning to have a conversation with LLMs so that its outputs are usable for the prompter. (As I’m writing this my feedreader updates to show a follow-up post about prompting by John.)

Last August I wrote about articles by Henrik Olaf Karlsson and Matt Webb that discuss prompting as a skill with newly increasing importance.

Prompting to get a certain type of output instrumentalises a conversation partner, which is fine for using LLM’s, but not for conversations with people. In human conversation the prompting is less to ensure output that is useful to the prompter but to assist the other to express themselves as best as they can (meaning usefulness will be a guaranteed side effect if you are interested in your conversational counterparts). In human conversation the other is another conscious actor in the same social system (the conversation) as you are.

John takes the need for us to learn to better prompt LLM’s and asks whether we’ll also learn how to better prompt conversations with other people. That would be great. Many conversations take the form of the listener listening less to the content of what others say and more listening for the right time to jump in with what they themselves want to say. Broadcast driven versus curiosity driven. Me and you, we all do this. Getting consciously better at avoiding that common pattern is a win for all.

In parallel Donald Clark wrote that the race to innovate services on top of LLM’s is on, spurred by OpenAI’s public release of Chat-GPT in November. The race is indeed on, although I wonder whether those getting in the race all have an actual sense of what they’re racing and are racing towards. The generic use of LLM’s currently in the eye of public discussion I think might be less promising than gearing it towards specific contexts. Back in August I mentioned Elicit that helps you kick-off literature search based on a research question for instance. And other niche applications are sure to be interesting too.

The generic models are definitely capable to hallucinate in ways that reinforce our tendency towards anthropomorphism (which needs little reinforcement already). Very very ELIZA. Even if on occasion it creeps you out when Bing’s implementation of GPT declares its love for you and starts suggesting you don’t really love your life partner.

I associated what Karlsson wrote with the way one can interact with one’s personal knowledge management system the way Luhmann described his note cards as a communication partner. Luhmann talks about the value of being surprised by whatever person or system you’re communicating with. (The anthropomorphism kicks in if we based on that surprisal then ascribe intention to the system we’re communicating with).

Being good at prompting is relevant in my work where change in complex environments is often the focus. Getting better at prompting machines may lift all boats.

I wonder if as part of the race that Donald Clark mentions, we will see LLM’s applied as personal tools. Where I feed a more open LLM like BLOOM my blog archive and my notes, running it as a personal instance (for which the full BLOOM model is too big, I know), and then use it to have conversations with myself. Prompting that system to have exchanges about the things I previously wrote down in my own words. With results that phrase things in my own idiom and style. Now that would be very interesting to experiment with. What valuable results and insight progression would it yield? Can I have a salon with myself and my system and/or with perhaps a few others and their systems? What pathways into the uncanny valley will it open up? For instance, is there a way to radicalise (like social media can) yourself by the feedback loops of association between your various notes, notions and follow-up questions/prompts?



An image generate with Stable Diffusion with the prompt “A group of fashionable people having a conversation over coffee in a salon, in the style of an oil on canvas painting”, public domain

Bookmarked The Two Definitions of Zettelkasten by Chris Aldrich

This is a great essay by Chris Aldrich for several reasons. Because it aims to address the absence in the current hypelet around recent personal knowledge management tools and note systems like Zettelkasten of the realisation that everything in this space has a deep rooted lineage. In response he writes about the history of commonplacing, using card collection for creative, academic or professional output. Because the essay itself is the result of the very practice it describes. In the past months I’ve been reading along with Chris’ annotations (the value of which led me to share more of my own annotations too), and reading his essay I can readily recognise things from that stream of raw material. The notes Chris made from those annotations in turn resulted in this essay. Seven thousands words in a half-day effort.

Note to self: I should create an overview for myself and here about my note taking practice through the years and their inspiration. Just to further illustrate the history Chris writes about.

Hopefully those in the space will look more closely at the well-worn cow paths of analog history in deciding how to pave our (digital) futures. [….] The hiding value proposition of the older methods can be contrasted with the incessant drumbeat of the value and productivity inherently “promised” by those describing [only] Niklas Luhmann’s system.

Chris Aldrich

I’ve now added over 100 annotations using Hypothes.is (h.), almost all within the last month. This includes a few non-public ones. Two weeks ago I wrote down some early impressions, to which I’m now adding some additional observations.

  1. 100 annotations (in a month) don’t seem like a lot to me, if h. is a regular tool in one’s browsing habit. H. says they have 1 million users, that have made 40 million annotations to over 2 million articles (their API returns 2.187.262 results as I write this). H. has been in existence for a decade. These numbers average out to 20 annotations to 2 articles per user. This to me suggests that the mode is 1 annotation to 1 article by a user and then silence. My 100 annotations spread out over 30 articles, accumulated over a handful of weeks is then already well above average, even though I am a new and beginning user. My introduction to h. was through Chris Aldrich, whose stream of annotations I follow daily with interest. He recently passed 10.000 annotations! That’s 100 times as many as mine, and apparently also an outlier to the h. team itself: they sent him a congratulatory package. H.’s marketing director has 1348 public annotations over almost 6 years, its founder 1200 in a decade. Remi Kalir, co-author of the (readworthy!) Annotation book, has 800 in six years. That does not seem that much from what I would expect to be power users. My blogging friend Heinz has some 750 annotations in three years. Fellow IndieWeb netizen Maya some 1800 in a year and a half. Those last two numbers, even if they differ by a factor 5 or so in average annotations/month, feel like what I’d expect as a regular range for routine users.
  2. The book Annotation I mentioned makes a lot of social annotation, where distributed conversations result beyond the core interaction of an annotator with an author through an original text. Such social annotation requires sharing. H. provides that sharing functionality and positions itself explicitly as a social tool ("Annotate the web, with anyone, anywhere" "Engage your students with social annotation"). The numbers above show that such social interaction around an annotated text within h. will be very rare in the public facing part of h., in the closed (safer) surroundings of classroom use interaction might be much more prominent. Users like me, or Heinz, Maya and Chris whom I named/linked above, will then be motivated by something else than the social aspects of h. If and when such interaction does happen (as it tends to do if you mutually follow eachothers annotations) it is a pleasant addition, not h.’s central benefit.
  3. What is odd to me is that when you do indeed engage into social interaction on h., that interaction cannot be found through the web interface of my annotations. Once I comment, it disappears out of sight, unless I remember what I reacted to and go back to that annotation by another user directly, to find my comment underneath. It does show up in the RSS feed of my annotations, and my Hypothes.is-to-Obsidian plugin also captures them through the API. Just not in the web interface.
  4. Despite the social nature of h., discovery is very difficult. Purposefully ‘finding the others’ is mostly impossible. This is both an effect of the web-interface functionality, as well as I suspect because of the relatively sparse network of users (see observation 1). There’s no direct way of connecting or searching for users. The social object is the annotation, and you need to find others only through annotations you encounter. I’ve searched for tags and terms I am interested in, but those do not surface regular users easily. I’ve collated a list of a dozen currently active or somewhat active annotators, and half a dozen who used to be or are sporadically active. I also added annotations of my own blogposts to my blog, and I actively follow (through an RSS feed) any new annotation of my blogposts. If you use h., I’d be interested to hear about it.
  5. Annotations are the first step of getting useful insights into my notes. This makes it a prerequisite to be able to capture annotations in my note making tool Obsidian, otherwise Hypothes.is is just another silo you’re wasting time on. Luckily h. isn’t meant as a silo and has an API. Using the API and the Hypothes.is-to-Obsidian plugin all my annotations are available to me locally. However, what I do locally with those notes does not get reflected back to h., meaning that you can’t really work through annotations locally until you’ve annotated an entire article or paper in the browser, otherwise sync issues may occur. I also find that having the individual annotations (including the annotated text, in one file), not the full text (the stuff I didn’t annotate), feels impractical at times as it cuts away a lot of context. It’s easily retrievable by visiting the url now, but maybe not over time (so I save web archive links too as an annotation). I also grab a local markdown copy of full articles if they are of higher interest to me. Using h. in the browser creates another inbox in this regard (having to return to a thing to finish annotation or for context), and I obviously don’t need more inboxes to keep track of.
  6. In response to not saving entire articles in my notes environment, I have started marking online articles I haven’t annotated yet at least with a note that contains the motivation and first associations I normally save with a full article. This is in the same spot as where I add a web archive link, as page note. I’ve tried that in recent days and that seems to work well. That way I do have a general note in my local system that contains the motivation for looking in more detail at an article.
  7. The API also supports sending annotations and updates to h. from e.g. my local system. Would this be potentially better for my workflow? Firefox and the h. add-on don’t always work flawlessly, not all docs can be opened, or the form stops working until I restart Firefox. This too points in the direction of annotating locally and sending annotations to h. for sharing through the API. Is there anyone already doing this? Built their own client, or using h. ‘headless’? Is there anyone who runs their own h. instance locally? If I could send things through the API, that might also include the Kindle highlights I pull in to my local system.
  8. In the same category of integrating h. into my pkm workflows, falls the interaction between h. and Zotero, especially now that Zotero has its own storage of annotations of PDFs in my library. It might be of interest to be able to share those annotations, for a more complete overview of what I’m annotating. Either directly from Zotero, or by way of my notes in Obsidian (Zotero annotatins end up there in the end)
  9. These first 100 annotations I made in the browser, using an add-on. Annotating in the browser takes some getting used to, as I try to get myself out of my browser more usually. I don’t always fully realise I can return to an article for later annotation. Any time the sense I have to finish annotating an article surfaces, that is friction I can do without. Apart from that, it is a pleasant experience to annotate like this. And that pleasure is key to keep annotating. Being able to better integrate my h. use with Obsidian and Zotero would likely increase the pleasure of doing it.
  10. Another path of integration to think about is sharing annotated links from h. to my blog or the other way around. I blog links with a general annotation at times (example). These bloggable links I could grab from h. where I bookmark things in similar ways (example), usually to annotate further later on. I notice myself thinking I should do both, but unless I could do that simultaneously I won’t do such a thing twice.