I’ve been involved in open data for about 15 years. Back then we had a vibrant European wide network of activists and civic organisations around open data, partially triggered by the first PSI Directive that was the European legal fundament for our call for more open government data.

Since 2020 a much wider and fundamental legal framework than the PSI Directive ever was is taking shape, with the Data Governance Act, Data Act, AI Regulation, Open Data Directive, High Value Data implementing regulation as building blocks. Together they create the EU single market for data, adding data as fourth element to the list of freedom of movement for people, products and capital within the EU. This will all take shape as the European common dataspace(s), built from a range of sectoral dataspaces.

In the past years I’ve been actively involved in these developments, currently helping large government data holders in the Netherlands interpret the new obligations and above all new opportunities for public service that result from all this.

Now that the dataspaces are slowly taking shape, what I find missing from most discussions and events is the voice of civic organisations and activists. It’s mostly IT companies and research institutions that are involved. While for the Commission social impact (climate, health, energy and agricultural transitions e.g.) is a key element in why they seek to implement these new laws, for most parties involved in the dataspaces that is less of a consideration, and economic and technological factors are more important. Not even government data holders themselves are represented much in how the European data space will turn out. Even though everyone single one of us and every public entity by default is a part of this common market.

I would like to strengthen the voice of civil society and activists in this area, to together influence the shape these dataspaces are taking. So that they are of use and value to us too. To use the new (legal) tools to strengthen the commons, to increase our agency.

Most of the old European open data network however over time has dissolved, as we all got involved in national level practical projects and the European network as a source of sense of belonging and strengthening each others commitment became less important. And we’ve moved on a good number of years, so many new people have come on to the scene, unconnected to that history, with new perspectives and new capabilities.

So the question is: who is active on these topics, from a civil society perspective, as activists? Who should be involved? What are the organisations, the events, that are relevant regionally, nationally, EU wide? Can we connect those existing dots: to share experiencs, examples, join our voices, pool our efforts?

Currently I’m doing a first scan of who is involved in which EU country, what type of events are visible, organisations that are active etc. Starting from my old network of a decade ago. I will share lists of what I find at Our Common Data Space.

Let me know if you count yourself as part of this European network. Let me know the relevant efforts you are aware of. Let me know which events you think bring together people likely to want to be involved.

I look forward to finding out about you!

Open Government Data Camp in Warsaw 2011. An example of the vibrancy of the European open data network, I called it the community’s ‘family christmas party’, at the time. Above the schedule of sessions created collectively by the participants, with many local initiatives and examples shared with the EU wide network. Below one of those sessions, on local policy making and open data.

This week it was 15 years ago that I became involved in open government data. In this post I look back on how my open data work evolved, and if it brought any lasting results.

I was at a BarCamp in Graz on political communication the last days of May 2008 and ended up in a conversation with Keith Andrews in a session about his wish for more government held data to use for his data visualisation research. I continued that conversation a week later with others at NL GovCamp on 7 June 2008 in Amsterdam, an event that I helped organise with James Burke and Peter Robinnet. There, on the rotting carpets of the derelict office building that had been the Volkskrant offices until 2007, several of us discussed how to bring about open data in the Netherlands:

My major take-away … was that a small group found itself around the task of making inventory of what datasets are actually held within Dutch government agencies. … I think this is an important thing to do, and am curious how it will develop and what I can contribute.
Me, 10 June 2008

Fifteen years on, what came of that ‘important thing to do’ and seeing ‘what I can contribute’?

At first it was mostly talk, ‘wouldn’t it be nice if ..’, but importantly part of that talk was with the Ministry responsible for government transparency who were present at NL GovCamp. Initially we weren’t allowed to meet at the Ministry itself, inviting ‘hackers’ in was seen as too sensitive, and over the course of 6 months several conversations with civil servants took place in a pub in Utrecht, before being formally invited to come talk. That however did result in a first assignment from January 2009, which I did with James and with Alper (who also had participated in NL GovCamp).

With some tangible results in hand from that project, I hosted a conversation at Reboot 11 in 2009 in Copenhagen about open data, leading to an extension of my European network on the topic. There I also encountered the Danish IT/open government team. Cathrine of that team invited me to host a panel at an event early 2010 where also the responsible official at the European Commission for open data was presenting. He invited me to Luxembourg to meet the PSI Group of national representatives in June 2010, and it landed me an invitation as a guest blogger that same month for an open data event hosted by the Spanish government and the ePSIplatform team, a European website on re-using government information.

There I also met Marc, a Dutch lawyer in open government. Having met various European data portal teams in Madrid, I then did some research for the Dutch government on the governance and costs of a Dutch open data portal in the summer of 2010, through which I met Paul who took on a role in further shaping the Dutch portal. Stimulated by the Commission with Marc I submitted a proposal to run the ePSIplatform, a public tender we won. The launching workshop of our work on the ePSIplatform in January 2011 in Berlin is where I met Frank. In the fall of 2011 I attended the Warsaw open government data camp, where Marc, Frank, Paul and I all had roles. I also met Oleg from the World Bank there. In November 2011 Frank, Paul, Marc and I founded The Green Land, and I have worked on over 40 open data projects since then under that label. Early 2012 I was invited to the World Bank in the US to provide some training, and later that year worked in Moldova for them. From 2014 I worked in Kazachstan, Kyrgyzstan, Serbia and Malaysia for the World Bank until 2019, before the pandemic ended it for now.

What stands out to me in this history of a decade and a half is:

  • How crucial chance encounters were/are and how those occurred around small tangible things to do. From those encounters the bigger things grew. Those chance encounters could happen because I helped organise small events, went to events by others, and even if they were nominally about something else, had conversations there about open data with likeminded people. Being in it for real, spending effort to strengthen the community of practitioners around this topic created track record quickly. This is something I recently mentioned when speaking about my work to students as well: making time for side interests is important, I’ve come to trust it as a source of new activities.
  • The small practical steps I took, a first exploratory project, creating a small collection of open data examples out of my own interest, writing the first version of an open data handbook with four others during a weekend in Berlin served as material for those conversations and were the scaffolding for bigger things.
  • I was at the right time, not too early, not late. There already was a certain general conversation on open data going on. In 2003 the EC had legislated for government data re-use, which had entered into force in May 2008, just 3 weeks before I picked the topic up. Thus, there was an implemented legal basis for open data in place in the EU, which however hadn’t been used by anyone as new instrument yet. By late 2008 Barack Obama was elected to the US presidency on a platform that included government transparency, which on the day after his inauguration in January 2009 resulted in a Memorandum to kick-start open government plans across the public sector. This meant there was global attention to the topic. So the circumstances were right, there was general momentum, just not very many people yet trying to do something practical.
  • Open data took several years to really materialise as professional activity for me. During those years most time was spent on explaining the topic, weaving the network of people involved across Europe and beyond. I have so many open data slide decks from 2009 and 2010 in my archive. In 2008, 2009 and 2010, I was active in the field but my main professional activities were still elsewhere. In 2009 after my first open data project I wondered out loud if this was a topic I could and wanted to continue in professionally. From early 2011 most of my income came from open data, while the need for building out the network of people involved was still strong. Later, from 2014 or so open data became more local, more regular, shifted to being part of data governance, and now data ethics. The pan-European network evaporated. Nevertheless helping improve European open data legislation has been a crucial element until now, to keep providing a fundament beneath the work.

From those 15 years, what stands out as meaningful results? What did it bring?
This is a hard and easy question at the same time. Hard because ‘meaningful’ can have many definitions. If we take achieving permanent or even institutionalised results as yard stick, two things stand-out. One at the beginning and one at the end of the 15 years.

  • My 2010 report for the Ministry for the Interior on the governance and financing of a national open data portal and facilitating a public consultation on what it would need to do, helped launch the Dutch open government data portal data.overheid.nl in 2011. A dozen years on, it is a key building block of the Dutch government’s public data infrastructure, and on the verge of taking on a bigger role with the implementation of the European data strategy.
  • At the other end of the timeline is the publication of the EU Implementing Regulation on High Value Data last December, for which I did preparatory research (PDF report), and which compels the entire public sector in Europe to publish a growing list of datasets through APIs for free re-use. Things I wrote about earth observation, environmental and meteorological data are in the law’s Annexes which every public body must comply with by next spring. What’s in that law about geographic data, company data and meteorological data ends more than three decades worth of discussion and court proceedings w.r.t. access to such data.

Talking about meaningful results is also an easy question, especially when not looking for institutional change:

  • Practically, it means my and my now 10 colleagues have an income, which is meaningful within the scope of our personal everyday lives. The director of a company I worked at 25 years ago once said to me when I remarked on the low profits of the company that year ‘well, over 40 families had an income meanwhile, so that’s something.’ I never forgot it. That’s certainly something.
  • There’s the NGO Open State Foundation that directly emerged from the event James, Peter and I organised in 2008. The next event in 2009 was named ‘Hack the Government’ and organised by James and several others who had attended in 2008. It was registered as a non-profit and from 2011 became the Open State Foundation, now a team of eight people still doing impactful work on making Dutch government more transparant. I’ve been the chair of their board for the last 5 years, which is a privilege.
  • Yet the most meaningful results concern people, changes they’ve made, and the shift in attitude they bring to public sector organisations. When you see a light go on in the eyes of someone during a presentation or conversation. Mostly you never learn what happens next. Sometimes you do. Handing out a few free beers (‘Data Drinks’) in Copenhagen making someone say ‘you’re doing more for Danish open data in a month by bringing everyone together than we did in the past years’. An Eastern European national expert seconded to the EC on open data telling me he ultimately came to this job because as a student he heard me speak once at his university and decided he wanted to be involved in the topic. An Irish civil servant who asked me in 2012 about examples I presented of collaboratively making public services with citizens, and at the end of 2019 messaged me it had led to the crowd sourced mapping of Lesotho in Open Street Map over five years to assist the Lesotho Land Registry and Planning Authority in getting good quality maps (embed of paywalled paper on LinkedIn). Someone picking up the phone in support, because I similarly picked up the phone 9 years earlier. None of that is directly a result of my work, it is fully the result of the work of those people themselves. Nothing is ever just one person, it’s always a network. One’s influence is in sustaining and sharing with that network. I happened to be there at some point, in a conversation, in a chance encounter, from which someone took some inspiration. Just as I took some inspiration from a chance encounter in 2008 myself. To me it’s the very best kind of impact when it comes to achieving change.

I’ve plotted the things mentioned above in this image for the most part. As part of trying to map the evolution of my work, inspired by another type of chance encounter with a mind map on the wall of museum.

The evolution of my open data (net)work. Click for larger version.

Bookmarked The Expanding Dark Forest and Generative AI by Maggie Appleton

I very much enjoyed this talk that Maggie Appleton gave at Causal Islands in Toronto, Canada, 25-27 April 2023. It reminds me of the fun and insightful keynotes at Reboot conferences a long time ago, some of which shifted my perspectives longterm.

This talk is about the impact on how we will experience and use the web when generative algorithms create most of its content. Appleton explores the potential effects of that and the futures that might result. She puts human agency at the center when it comes to how to choose our path forward in experimenting and using ‘algogens’ on the web, and how to navigate an internet where nobody believes you’re human.

Appleton is a product designer with Ought, on products that use language models to augment and extend human (cognitive) capabilities. Ought makes Elicit, a tool that surfaces (and summarises) potentially useful papers for your research questions. I use Elicit every now and then, and really should use it more often.

An exploration of the problems and possible futures of flooding the web with generative AI content

Maggie Appleton

During our visit to the Neues Museum in Neuremberg last week, this mind map stood out to me. Art collector, dealer and curator René Block (1942) made it as a sort of work autobiography for the period 1964-2014.

It stood out to me because it shows the evolution of his work, the connections between major phases and individual projects.

I have a list of a few ‘big themes’ I’ve been interested in, and have worked on. (in that order, as most often my work came out of a side interest during a previous phase, also when I was employed), and over time I’ve recognised the overall topic that carries them all, a fascination with the affordances of digital technology for our agency and how it impacts how we live, learn, work and organise.

In any given moment I can think that most of my activities are a coincidence, that I happened across them on my generally undirected path, but my blog archive has often shown me that I already mentioned topics and ideas much earlier.
There’s an evolution to them, and since I’ve spotted the ‘carrier theme’ I trust that evolution.

I’m sure I can make a mind map like the one above with the different client projects, activities and key events of the past 26 years. Maybe everyone should make such a map for themselves at times, if only to spot the adjacent paths within one’s reach in the evolutionary plane of possibilities. It seems Block made this at the end of his working life when he was 72. What might it have told him if he had drawn or redrawn it at earlier times?

I have installed AutoGPT and started playing with it. AutoGPT is a locally installed and run piece of software (in a terminal window) that you theoretically can set a result to achieve and then let run to achieve it. It’s experimental so it is good advice to actually follow its steps along and approve individual actions it suggests doing.
It interacts with different generative AI tools (through your own API keys) and can initiate different actions, including online searches as well as spawning new interactions with LLM’s like GPT4 and using the results in its ongoing process. It chains these prompts and interactions together to get to a result (‘prompt chaining’).

I had to tweak some of the script a little bit (it calls python and pip but it needs to call python3 and pip3 on my machine) but then it works.

Initially I have it set up with OpenAI’s API, as the online guide I found were using that. However in the settings file I noticed I can also choose to use other LLM’s like the publicly available models through Huggingface, as well as image generating AIs.

I first attempted to let it write scripts to interact with the hypothes.is API. It ended up in a loop about needing to read the API documentation but not finding it. At that time I did not yet provide my own interventions (such as supplying the link to the API documentation). When I did so later it couldn’t come up with next steps, or not ingesting the full API documentation (only the first few lines) which also led to empty next steps.

Then I tried a simpler thing: give me a list of all email addresses of the people in my company.
It did a google search for my company’s website, and then looked at it. The site is in Dutch which it didn’t notice, and it concluded there wasn’t a page listing our team. I then provided it with the link to the team’s page, and it did parse that correctly ending up with a list of email addresses saved to file, while also neatly summarising what we do and what our expertise is.
While this second experiment was successfully concluded, it did require my own intervention, and the set task was relatively simple (scrape something from this here webpage). This was of limited usefulness, although it did require less time than me doing it myself. It points to the need of having a pretty clear picture of what one wants to achieve and how to achieve it, so you can provide feedback and input at the right steps in the process.

As with other generative AI tools, doing the right prompting is key, and the burden of learning effective prompting lies with the human tool user, the tool itself does not provide any guidance in this.

I appreciate it’s an early effort, but I can’t reproduce the enthusiastic results others claim. My first estimation is that those claims I’ve seen are based on hypothetical things used as prompts and then being enthusiastic about the plausible outcomes. Whereas if you try an actual issue where you know the desired result it easily falls flat. Similar to how ChatGPT can provide plausible texts except when the prompter knows what good quality output looks like for a given prompt.

It is tempting to play with this thing nevertheless, because of its positioning as a personal tool, as potential step to what I dubbed narrow band digital personal assistants earlier. I will continue to explore, first by latching onto the APIs of more open models for generative AI than OpenAI’s.

I have a little over 25 years worth of various notes and writings, and a little over 20 years of blogposts. A corpus that reflects my life, interests, attitude, thoughts, interactions and work over most of my adult life. Wouldn’t it be interesting to run that personal archive as my own chatbot, to specialise a LLM for my own use?

Generally I’ve been interested in using algorithms as personal or group tools for a number of years.

For algorithms to help, like any tool, they need to be ‘smaller’ than us, as I wrote in my networked agency manifesto. We need to be able to control its settings, tinker with it, deploy it and stop it as we see fit.
Me, April 2018, in Algorithms That Work For Me, Not Commoditise Me

Most if not all of our exposure to algorithms online however treats us as a means to manipulate our engagement. I see them as potentially very valuable tools in working with lots of information. But not in their current common incarnations.

Going back to a less algorithmic way of dealing with information isn’t an option, nor something to desire I think. But we do need algorithms that really serve us, perform to our information needs. We need less algorithms that purport to aid us in dealing with the daily river of newsy stuff, but really commodotise us at the back-end.
Me, April 2018, in Algorithms That Work For Me, Not Commoditise Me

Some of the things I’d like my ideal RSS reader to be able to do are along such lines, e.g. to signal new patterns among the people I interact with, or outliers in their writings. Basically to signal social eddies and shifts among my network’s online sharing.

LLMs are highly interesting in that regard too, as in contrast to the engagement optimising social media algorithms, they are focused on large corpora of text and generation thereof, and not on emergent social behaviour around texts. Once trained on a large enough generic corpus, one could potentially tune it with a specific corpus. Specific to a certain niche topic, or to the interests of a single person, small group of people or community of practice. Such as all of my own material. Decades worth of writings, presentations, notes, e-mails etc. The mirror image of me as expressed in all my archived files.

Doing so with a personal corpus, for me has a few prerequisites:

  • It would need to be a separate instance of whatever tech it uses. If possible self-hosted.
  • There should be no feedback to the underlying generic and publicly available model, there should be no bleed-over into other people’s interactions with that model.
  • The separate instance needs an off-switch under my control, where off means none of my inputs are available for use someplace else.

Running your own Stable Diffusion image generator set-up as E currently does complies with this for instance.

Doing so with a LLM text generator would create a way of chatting with my own PKM material, ChatPKM, a way to interact (differently than through search and links, as I do now) with my Avatar (not just my blog though, all my notes). It might adopt my personal style and phrasing in its outputs. When (not if) it hallucinates it would be my own trip so to speak. It would be clear what inputs are in play, w.r.t. the specialisation, so verification and references should be easier to follow up on. It would be a personal prompting tool, to communicate with your own pet stochastic parrot.

Current attempts at chatbots in this style seem to focus on things like customer interaction. Feed it your product manual, have it chat to customers with questions about the product. A fancy version of ‘have you tried switching it off and back on?‘ These services allow you to input one or a handful of docs or sources, and then chat about its contents.
One of those is Chatbase, another is ChatThing by Pixelhop. The last one has the option of continuously adding source material to presumably the same chatbot(s), but more or less on a per file and per URL basis and limited in number of words per month. That’s not like starting out with half a GB in markdown text of notes and writings covering several decades, let alone tens of GBs of e-mail interactions for instance.

Pixelhop is currently working with Dave Winer however to do some of what I mention above: use Dave’s entire blog archives as input. Dave has been blogging since the mid 1990s, so there’s quite a lot of material there.
Checking out ChatThing suggests that they built on OpenAI’s ChatGPT 3.5 through its API. So it wouldn’t qualify per the prerequisites I mentioned. Yet, purposely feeding it a specific online blog archive is less problematic than including my own notes as all the source material involved is public anyway.
The resulting Scripting News bot is a fascinating experiment, the work around which you can follow on GitHub. (As part of that Dave also shared a markdown version of his complete blog archives (33MB), which for fun I loaded into Obsidian to search through. Also for comparison with the generated outputs from the chatbot, such as the question Dave asked the bot when he first wrote about the iPhone on his blog.)

Looking forward to more experiments by Dave and Pixelhop. Meanwhile I’ve joined Pixelhop’s Discord to follow their developments.