Bryan Alexander provides an overview and interesting analysis of his current social media presences and what they mean and have meant for him, his work and interaction.
His summing up of the various platforms that used to be and currently are online places he frequents reminded me of how I talked about my online presences around 2007. What they did for me, and what I shared through these platforms.
I called it the Long List of My Distributed Self back then.

It read:

Blog, what I think about
Jaiku, what I am doing
Twitter, what I say I am doing
Plazes, where I am and where I was
Dopplr, where I will be
Flickr, what I see
delicious, what I read
Wakoopa, what software I use
Slideshare, what I talk about
Upcoming, where I will attend
Last.fm, what I listen to
and then there is my LinkedIn, my Facebook, my Xing, my Hyves, my NING, and my collaborative tools MindMeister, Thinkfold, and Googledocs.

That list these days is much shorter.

The utility of social software and web2.0 as we called it then, not social media, is that of leaving longer traces. Traces for others to stumble across, so that interaction can happen. As a way of ‘finding the others’, creating conversations and emergent networks of connections.

Bryan Alexander is the only blogger I never met in person and yet see as part of my inner circle of bloggers I’m in touch with through my feedreader. That interaction goes back 20 years. Talking about leaving longer traces.

All that in a context where the number of users on these platforms was smaller and, most importantly, well before the currently remaining of those platforms turned on them and started manipulating what everyone saw. Which ultimately moved them completely away of enabling longer traces, and made it harder to find the others. That affordance having been replaced by shoving those (and things) that are already highly visible in everyone’s face, without anyone seeking those out intentionally. And now adding the still denser fog of generated slop.

The change in those platforms, replacing lengthened human traces with adtech’s engagement optimising masquerading as such, has shortened that 2007 long list of my distributed self.

A range of services on that list shut down or were acquired and then subsumed, Jaiku, Plazes, Dopplr, delicious, Wakoopa, Hyves, NING, Thinkfold. Some of those in terms of functionality I still miss, especially delicious, Plazes and Dopplr.
Others showed themselves less capable of / suited for the type of longer traces and finding of others I was interested in, such as Upcoming, Last.fm, Xing.
Those that survived became toxic, Facebook, Twitter, Slideshare, Foursquare. My use of collaboration tools moved to less public environments although open source and self-hosted.

The current list of my distributed self is short, much shorter than in 2007.

  • Blog, this place here, still the main element, and across all of these service past and present the most long lived one and the one under my own full control. It generates conversations, although less in the comment section. Regularly though people, also first time commenters, respond using email.
  • Flickr, still in use, for 20 years now too, but it’s not much of a social space these days, more a convenient archive that I automatically add to from my phone. I have removed (almost) all embeddings of Flickr photos in this site and replaced them with a local copy of the image and a link to their location on Flickr, preempting any tracking unless one clicks the link. While I may still decide to do away with Flickr too at some point, currently its utility as a searchable and chronological archive of 43k of my photos is still high for me.
  • Hypothesis, a new entrant in the list, is a very useful annotation tool, that functions somewhat like an alternative for delicious, the bookmarking tool of old. It has a social aspect, centered around the annotated text, and while ‘finding the others’ through it doesn’t happen often it happens often enough to be delightful.
  • Mastodon, which does Twitter like it’s 2006, which I use from a single person instance, avoiding the scaling that led Twitter et al astray. I cut the ‘longer traces’ aspect short on Mastodon, deleting entries after a few days. Born out of practicality (Mastodon bloats the needed database volume at astonishing rates), it is also a recognition of those messages being ephemera, conversations in passing. Finding the others is still very possible through it, and messages I don’t want to treat as transient originate in my blog (which I then automatically cross post to my separate Mastodon profile), and resulting conversation comes back to my blog as well.
  • LinkedIn, which I can barely tolerate these days, since its timeline degenerated substantially early on in the pandemic. Mostly still there because I completely ditched that timeline (by unfollowing all contacts) and am treating LinkedIn as a self-updating rolodex. It means that I don’t regard or experience it as a social software tool for interaction or finding the others any more.

I shift my behavior as a given system changes how it operates, Bryan writes. True.
Those system changes have over time tended to making one’s online traces harder to stumble across (by reducing interoperability, closing off, and eroding the very building block of the web, the link), and making finding the others harder (even the strongly diminishing quality of web search itself is part of that). A likely answer to that is more distributed approaches, with your self at the core, and navigating widening circles of contacts found through other contacts. The triangulation for that still works but it does take more attention and effort. The trouble is that for most of us it’s not within our agency to do that technologically ourselves. A balancing between that and the avoidance of centralised silos (old and new) is to be sought. Here be dragons, not unicorns.

Twenty years ago today E and I visited Reboot 7 in Copenhagen. What I wrote a decade ago at the 10th anniversary of that conference still holds true for me.

Over time Reboot 7 became mythical. A myth that can’t return. But one we were part of, participated in and shaped.
Still got the t-shirt.


The yellow t-shirt with red text from the 2005 Reboot 7 conference, on my blue reading chair in my home office 20 years on.

Seventeen years ago today I blogged about a barcamp style event in Amsterdam I co-hosted, called GovCamp_NL. I struck up a conversation there about open government data after having had a similar conversation the week before in Austria. It marked the beginning of my work in this field. We just welcomed the thirteenth team member in the company that over time grew out of that first conversation. Our work at my company is driven by the same thing as the event, something I’ve come to call constructive activism.

These days, the principles and values that drove those events, and have set the tone for the past two decades of everything I’ve done professionally and socially, seem more important than ever. They are elemental in the current geopolitical landscape around everything digital and data. We can look back on our past selves with 20 years hindsight and smile about our one time optimism, because so much exploitation, abuse and surveillance grew out of the platforms and applications that originate in the early 00’s. But not because that optimism was wrong. Naive yes, in thinking that the tech would all take care of itself, by design and by default, and we just needed to nudge it a bit. That optimism in the potential for (networked) agency, for transparency, for inclusion, for diversity, and for global connectedness is still very much warranted, as a celebration of human creativity, of the sense of wonder that wielding complexity for mutual benefit provides, just not singularly attached to the tech involved.
Anything digital is political. The optimism is highly political too.

The time to shape the open web and digital ethics is now, is every day. Time for a reboot.

Back in 2022 the Belgian and other data protection boards found that IAB’s ‘Transparency and Consent Framework‘ is illegal, because it is neither transparent nor has any meaningful connection with the word consent. IAB is the industry club for adtech users. Yesterday this verdict was upheld on appeal.

You know the kind of consent form from about 80% of websites, it takes one click to give away everything for the next three generations, and a day of clicks to deny consent. They need to coerce your consent to feed the tracking based real-time-bidding mechanisms for displaying all those ads that you see if you don’t use an ad blocker like a sane adult.

It was always clear that type of behaviour does not result in freely given consent for tracking and is illegal under the GDPR. But it takes time to have such things contested in court and affirmed before adtech corporations will admit it.

The 2022 decision now upheld on appeal (PDF in Dutch) applies immediately across the EU, and will impact such IAB members as Google, Microsoft, Amazon, X, and Automattic (WordPress) (at least they were a member back in 2022). The appeal to the decision was filed in March 2022, and the Belgian court submitted several prejudicial questions to the European court of justice, that were answered in spring 2024, and now lead to a decision.

Excellent work by the Irish Council for Civil Liberties and others.

Ceterum censeo AdTech is fundamentally non-compatible with the GDPR, and needs to die.

Digitale autonomie klinkt mooi en noodzakelijk. In de praktijk maken product managers en IT-managers keuzes op andere gronden dan ‘autonomie’. Een klassiek geval van een weging moeten maken tussen iets dat abstract is (‘autonomie’), versus iets dat veel concreter en vertrouwder is voor degene die een beslissing neemt (‘total cost of ownership’ bijv.). Bij digitale ethiek zie ik dat ook vaak, en in wegingen t.a.v. de AVG ook. Het concrete wint het dan meestal van het abstracte of algemene. Omdat je verschillende categorieën van dingen aan het vergelijken bent en we daar slecht in zijn.

Hoe maak je digitale autonomie tastbaar genoeg om het op directieniveau besproken te krijgen? Door de te vergelijken aspecten wel vergelijkbaar te maken. Recent kreeg ik een inkijkje in hoe dat binnen een grote uitvoeringsorganisatie is gegaan.

Startpunt was de probleemstelling: de keuze voor bepaalde digitale diensten o.b.v. louter financiële en technische factoren leidt tot afhankelijkheden. Dit introduceert operationele en financiële kwetsbaarheden, omdat er beslissingen buiten de eigen organisatie kunnen worden genomen die rechtstreeks de eigen primaire operationele processen stil kunnen leggen.

Het digitale stapelmodel uit de Digitale Open Strategische Autonomie (DOSA, 2023 Ministerie van Economische Zaken) is als hulpmiddel bij de analyse ingezet.

Het stapelmodel maakt onderscheid in lagen, van grondstoffen onderin via infrastructuur naar data en applicaties.

De analyse keek naar twee assen op elk van die lagen: is de organisatie ervan of niet? doet de organisatie het zelf, of doen ze het niet zelf?

In dit geval is de organisatie zelf op het vlak van data en toepassingen zeer actief, en dat behoort ook tot de kern van wie ze zijn. Maar ze besteden ook veel uit. In mindere mate geldt hetzelfde voor ‘zachte infrastrctuur’. Ze zijn niet van de harde infra, en hardware, en al helemaal niet van grondstoffen, en doen daar ook vrijwel niets zelf.

En wat was de verandering in de afgelopen jaren?

In bovenstaande plaatje zie je dat er op de vlakken van data, applicaties en zachte infrastructuur meer is uitbesteed in de afgelopen jaren. Daarbij is men ook zelf veel blijven doen m.b.t. data en applicaties, maar op het vlak van zachte infrastructuur is men minder zelf gaan doen en zijn bepaalde activiteiten gestopt.

Om afhankelijkheden te verminderen (m.n. daar waar je ‘er van bent’ maar wel veel uitbesteed), kun je benoemen of je de verandering wilt omkeren, en of je dat samen met anderen wilt doen. In onderstaande plaatje bijvoorbeeld, zelf volledig op de data focussen, t.a.v. applicaties zelf veel blijven doen, minder uitbesteden en meer samenwerken in de keten, en t.a.v. zachte infrastructuur minder zelf doen, meer samenwerken in de keten, en minder uitbesteden.

Hieruit volgen in een discussie makkelijker elementen van een uitvoerbare strategie, en concretere afwegingen t.a.v. inkoopvereisten die je aan anderen stelt, kennisontwikkeling in de organisatie, en samenwerkingsverbanden met ketenpartners.

The period of the European Commission that has just finished delivered an ambitious and coherent legal framework for both the single digital market and the single market for data, based on the digital and data strategies the EU formulated. Those laws, such as the Data Governance Act, Data Act, High Value Data implementing regulation and the AI Act are all finished and in force (if not always fully in application). This means efforts are now switching to implementation. The detailed programme of the next European Commission, now being formed, isn’t known yet. Big new legislation efforts in this area are however not expected.

This summer Ursula von der Leyen, the incoming chairperson of the Commission has presented the political guidelines. In it you can find what the EC will pay attention to in the coming years in the field of data and digitisation.

Data and digital are geopolitical in nature
The guidelines underline the geopolitical nature of both digitisation and data. The EU will therefore seek to modernise and strengthen international institutions and processes. It is noted that outside influence in regular policy domains has become a more common instrument in geopolitics. Data and transparency are likely tools to keep a level headed view of what’s going on for real. Data also is crucial in driving several technology developments, such as in AI and digital twins.

European Climate Adaptation Plan Built on Data
The EU will increase their focus on mapping risks and preparedness w.r.t. natural disasters and their impact on infrastructure, energy, food security, water, land use both in cities and in rural areas, as well as early warning systems. This is sure to contain a large data component, a role for the Green Deal Data Space (for which the implementation phase will start soon, now the preparatory phase has been completed) and the climate change digital twin of the earth (DestinE, for which the first phase has been delivered). Climate and environment are the areas where already before the EC emphasised the close connection between digitisation and data and the ability to achieve European climate and environmental goals.

AI trained with data
Garbage in, garbage out: access to enough high quality data is crucial to all AI development, en therefore data will play a role in all AI plans from the Commission.

An Apply AI Strategy was announced, aimed at sectoral AI applications (in industry, public services or healthcare e.g.). The direction here is towards smaller models, squarely aimed at specific questions or tasks, in the context of specific sectors. This requires the availability and responsible access to data in these sectors, in which the European common data spaces will play a key role.

In the first half of 2025 an AI Factories Initiative will be launched. This is meant to provide SME’s and newly starting companies with access to the computing power of the European supercomputing network, for AI applications.

There will also be an European AI Research Council, dubbed a ‘CERN for AI’, in which knowledge, resources, money, people, and data.

Focus on implementing data regulations
The make the above possible a coherent and consistent implementation of the existing data rules from the previous Commission period is crucial. Useful explanations and translations of the rules for companies and public sector bodies is needed, to allow for seamless data usage across Europe and at scale. This within the rules for data protection and information security that equally apply. The directorate within the Commission that is responsible for data, DG Connect, sees their task for the coming years a mainly being ensuring the consistent implementation of the new laws from the last few years. The implementation of the GDPR until 2018 is seen as an example where such consistency was lacking.

European Data Union
The political guidelines announce a strategy for a European Data Union. Aimed at better and more detailed explanations of the existing regulations, and above all the actual availability and usage of data, it reinforces the measure of success the data strategy already used: the socio-economic impact of data usage. This means involving SME’s at a much larger volume, and in this context also the difference between such SME’s and large data users outside of the EU is specifically mentioned. This Data Union is a new label and a new emphasis on what the European Data Strategy already seeks to do, the creation of a single market for data, meaning a freedom of movement for people, goods, capital and data. That Data Strategy forms a consistent whole with the digital strategy of which the Digital Markets Act, Digital Services Act and AI Act are part. That coherence will be maintained.

My work: ensuring that implementation and normalisation is informed by good practice
In 2020 I helped write what is now the High Value Data implementing regulation, and in the past years my role has been tracking and explaining the many EU digital and data regulations initiatives on behalf of the main Dutch government holders of geo-data. Not just in terms of new requirements, but with an accent on the new instruments and affordances those rules create. The new instruments allow new agency of different stakeholder groups, and new opportunities for societal impact come from them.
The phase shift from regulation to implementation provides an opportunity to influence how the new rules get applied in practice, for instance in the common European data spaces. Which compelling cases of data use can have an impact on implementation process, can help set the tone or even have a normalisation effect? I’m certain practice can play a role like this, but it takes bringing those practical experiences to a wider European network. Good examples help keep the actual goal of socio-economic impact in sight, and means you can argue from tangible experience in your interactions.

My work for Geonovum the coming time is aimed at this phase shift. I already helped them take on a role in the coming implementation of the Green Deal Data Space, and I’m now exploring other related efforts. I’m also assisting the Ministry for the Interior in formulating guidance for public sector bodies and data users on how to deal with the chapter of the Data Governance Act that allows for the use (but not the sharing) of protected data held by the public sector. Personally I’m also seeking ways to increase the involvement of civil society organisations in this area.

Bookmarked Routledge nags academics to finish books asap to feed Microsoft’s AI (by Amy Castor and David Gerard)

Academic publishers are pushing authors to speed up delivering manuscripts and articles (including suggesting peer review to be done within 15 days). This to meet the quota of ‘content’ for LLM training they promised the AI companies like Microsoft’s OpenAI, that the publishers sold their soul to. Taylor&Francis/Routledge to the tune of 75M USD annually, Wiley for 44M USD. Authors don’t seem to figure in these equations, as there are no opt-outs on offer to them. No cuts either, obviously.

Must. Feed. The. Machine. Apparently. Any randomly arranged bunch of words or phrases will do. Apparently.
Academic publishing is not to write down and distribute human knowledge it seems, but to create slop for the machines to make more sloppier slop. Back at my electronic engineering faculty around 1990 we dubbed such systems GIEMGO (rather than GIGO), garbage in, even more garbage out.

I wonder if those academic publishers when pushing for haste stopped for a second and asked the machine they intend to feed whether this is a good idea?
I did. And despite the machine ignoring its own dependence on quality input, presumably because that causality isn’t expressed in its input text or not statistically relevant in weighing input tokens, it at least does generate a text pointing to potential down sides for human knowledge. How nice of it.

Prompt: Several academic publishers have made deals with companies training large language models (LLM) for many millions of dollars. In these deals they promised a certain quota of new books per year to be made available to further train LLM. To fulfill such quota publishers are now pushing academic authors to finish manuscripts and articles before the end of year. Including by foregoing proper peer review. Do you think this is a good idea? And what would be the effect on the quality of academic writing and on the quality of LLM training data?
Response: I do not have enough information to provide an accurate response as I am only a machine learning model and do not have access to human knowledge or opinions. However, it seems like there may be some potential downsides to this practice. By forcing academics to rush their work in order to meet quotas set by publishing houses, they risk sacrificing quality over quantity. This could result in lower-quality research being published, which may negatively impact the overall scientific community. Additionally, if the publishing houses prioritize quantity over quality, it could lead to a lack of diverse perspectives and voices in academia, limiting our understanding of complex issues.

Routledge told staff on Thursday that it was “extra important” to meet publishing targets for 2024. Why? Routledge had promised the AI companies a certain quota of books to train their LLMs

Amy Castor and David Gerard