In reply to Kann man die Twitter-Uhr zurückstellen? Zum Bluesky-Hype im österreichischen Journalismus by Heinz Wittenbrink

Du hast denke ich recht Heinz das der Umzug von Journalisten in Richtung Bluesky eine verpasste Chance ist. Aber nicht nur für die Journalisten selbst als individuelle Professionals. Ich verstehe nicht warum Zeitungen und Medien nicht selbst eine kleine Fediverse-Instanz ins Leben rufen. Damit kann man direkt und unangreifbar die Authentizität eines Accounts belegen, da sie verbunden ist mit der eigenen Internetdomäne. Sowie zB hier in den Niederlanden der Mastodon Server der Verwaltung auf läuft, und die Domäne ist für alle Verwaltungsinformationen. Strategisch ist eine verpasste Chance mMn das Zeitungen das Potential für Handlungsfreiheit im offenen Web nicht beachten, und das den einzelnen Reportern als Wahl überlassen. Obwohl man sich regelmässig darüber beklagt das BigTech ihnen Handlungsfreihet wegnimmt (sowohl bei online Äusserungen wie bei Werbung und Besucherzuleitung über Suchmashinen). Man erinnert sich anscheinend nicht das es Journalisten und Politiker waren die Twitter über die Tech-Szene hinaus groß gemacht haben als Nachrichtenquelle, und verpaßt jetzt diese (vierte?) Macht anzuwenden, und verliert sich aufs neue in einen Silo betreut von Miljardäre, VCs und Crypto-bros. Nur weil freier Zugang und hypothetische Federation (pinky promise) über den Eingang steht. Tech geht immer schneller wie man sagt, und ich nehme an das diese Beschleunigung auch eine schnellere Enshittification (Verscheißifikation?) bedeuten wird. In den Niederlanden gibt’s die Initiative Public Spaces, gestartet durch öffentlichen Medien und in Zusammenarbeit mit anderen Organisationen die ein offenes Web und öffentlicher Diskurs stärken wollen. Mit praktischen Mitteln, eine jährliche Konferenz usw. bringen die das voran. Vielleicht ist es möglich da auch in .at was zu bewegen, so wie du das in 2008 mittels dem Politcamp auch bez. politische online Kommunikation getan hast.

Die Gruppe, die jetzt zu Bluesky gewechselt ist, wäre sicher in der Lage, Einrichtung und Betreuung eines kleinen Mastodon-Servers zu organisieren. Ich weiss aus den Erfahrungen bei, dass der Aufwand überschaubar ist. Es gibt in Österreich Organisationen wie den Presseclub Concordia, die die Trägerschaft übernehmen könnten.

Heinz Wittenbrink

The period of the European Commission that has just finished delivered an ambitious and coherent legal framework for both the single digital market and the single market for data, based on the digital and data strategies the EU formulated. Those laws, such as the Data Governance Act, Data Act, High Value Data implementing regulation and the AI Act are all finished and in force (if not always fully in application). This means efforts are now switching to implementation. The detailed programme of the next European Commission, now being formed, isn’t known yet. Big new legislation efforts in this area are however not expected.

This summer Ursula von der Leyen, the incoming chairperson of the Commission has presented the political guidelines. In it you can find what the EC will pay attention to in the coming years in the field of data and digitisation.

Data and digital are geopolitical in nature
The guidelines underline the geopolitical nature of both digitisation and data. The EU will therefore seek to modernise and strengthen international institutions and processes. It is noted that outside influence in regular policy domains has become a more common instrument in geopolitics. Data and transparency are likely tools to keep a level headed view of what’s going on for real. Data also is crucial in driving several technology developments, such as in AI and digital twins.

European Climate Adaptation Plan Built on Data
The EU will increase their focus on mapping risks and preparedness w.r.t. natural disasters and their impact on infrastructure, energy, food security, water, land use both in cities and in rural areas, as well as early warning systems. This is sure to contain a large data component, a role for the Green Deal Data Space (for which the implementation phase will start soon, now the preparatory phase has been completed) and the climate change digital twin of the earth (DestinE, for which the first phase has been delivered). Climate and environment are the areas where already before the EC emphasised the close connection between digitisation and data and the ability to achieve European climate and environmental goals.

AI trained with data
Garbage in, garbage out: access to enough high quality data is crucial to all AI development, en therefore data will play a role in all AI plans from the Commission.

An Apply AI Strategy was announced, aimed at sectoral AI applications (in industry, public services or healthcare e.g.). The direction here is towards smaller models, squarely aimed at specific questions or tasks, in the context of specific sectors. This requires the availability and responsible access to data in these sectors, in which the European common data spaces will play a key role.

In the first half of 2025 an AI Factories Initiative will be launched. This is meant to provide SME’s and newly starting companies with access to the computing power of the European supercomputing network, for AI applications.

There will also be an European AI Research Council, dubbed a ‘CERN for AI’, in which knowledge, resources, money, people, and data.

Focus on implementing data regulations
The make the above possible a coherent and consistent implementation of the existing data rules from the previous Commission period is crucial. Useful explanations and translations of the rules for companies and public sector bodies is needed, to allow for seamless data usage across Europe and at scale. This within the rules for data protection and information security that equally apply. The directorate within the Commission that is responsible for data, DG Connect, sees their task for the coming years a mainly being ensuring the consistent implementation of the new laws from the last few years. The implementation of the GDPR until 2018 is seen as an example where such consistency was lacking.

European Data Union
The political guidelines announce a strategy for a European Data Union. Aimed at better and more detailed explanations of the existing regulations, and above all the actual availability and usage of data, it reinforces the measure of success the data strategy already used: the socio-economic impact of data usage. This means involving SME’s at a much larger volume, and in this context also the difference between such SME’s and large data users outside of the EU is specifically mentioned. This Data Union is a new label and a new emphasis on what the European Data Strategy already seeks to do, the creation of a single market for data, meaning a freedom of movement for people, goods, capital and data. That Data Strategy forms a consistent whole with the digital strategy of which the Digital Markets Act, Digital Services Act and AI Act are part. That coherence will be maintained.

My work: ensuring that implementation and normalisation is informed by good practice
In 2020 I helped write what is now the High Value Data implementing regulation, and in the past years my role has been tracking and explaining the many EU digital and data regulations initiatives on behalf of the main Dutch government holders of geo-data. Not just in terms of new requirements, but with an accent on the new instruments and affordances those rules create. The new instruments allow new agency of different stakeholder groups, and new opportunities for societal impact come from them.
The phase shift from regulation to implementation provides an opportunity to influence how the new rules get applied in practice, for instance in the common European data spaces. Which compelling cases of data use can have an impact on implementation process, can help set the tone or even have a normalisation effect? I’m certain practice can play a role like this, but it takes bringing those practical experiences to a wider European network. Good examples help keep the actual goal of socio-economic impact in sight, and means you can argue from tangible experience in your interactions.

My work for Geonovum the coming time is aimed at this phase shift. I already helped them take on a role in the coming implementation of the Green Deal Data Space, and I’m now exploring other related efforts. I’m also assisting the Ministry for the Interior in formulating guidance for public sector bodies and data users on how to deal with the chapter of the Data Governance Act that allows for the use (but not the sharing) of protected data held by the public sector. Personally I’m also seeking ways to increase the involvement of civil society organisations in this area.

I’ve been involved in open data for about 15 years. Back then we had a vibrant European wide network of activists and civic organisations around open data, partially triggered by the first PSI Directive that was the European legal fundament for our call for more open government data.

Since 2020 a much wider and fundamental legal framework than the PSI Directive ever was is taking shape, with the Data Governance Act, Data Act, AI Regulation, Open Data Directive, High Value Data implementing regulation as building blocks. Together they create the EU single market for data, adding data as fourth element to the list of freedom of movement for people, products and capital within the EU. This will all take shape as the European common dataspace(s), built from a range of sectoral dataspaces.

In the past years I’ve been actively involved in these developments, currently helping large government data holders in the Netherlands interpret the new obligations and above all new opportunities for public service that result from all this.

Now that the dataspaces are slowly taking shape, what I find missing from most discussions and events is the voice of civic organisations and activists. It’s mostly IT companies and research institutions that are involved. While for the Commission social impact (climate, health, energy and agricultural transitions e.g.) is a key element in why they seek to implement these new laws, for most parties involved in the dataspaces that is less of a consideration, and economic and technological factors are more important. Not even government data holders themselves are represented much in how the European data space will turn out. Even though everyone single one of us and every public entity by default is a part of this common market.

I would like to strengthen the voice of civil society and activists in this area, to together influence the shape these dataspaces are taking. So that they are of use and value to us too. To use the new (legal) tools to strengthen the commons, to increase our agency.

Most of the old European open data network however over time has dissolved, as we all got involved in national level practical projects and the European network as a source of sense of belonging and strengthening each others commitment became less important. And we’ve moved on a good number of years, so many new people have come on to the scene, unconnected to that history, with new perspectives and new capabilities.

So the question is: who is active on these topics, from a civil society perspective, as activists? Who should be involved? What are the organisations, the events, that are relevant regionally, nationally, EU wide? Can we connect those existing dots: to share experiencs, examples, join our voices, pool our efforts?

Currently I’m doing a first scan of who is involved in which EU country, what type of events are visible, organisations that are active etc. Starting from my old network of a decade ago. I will share lists of what I find at Our Common Data Space.

Let me know if you count yourself as part of this European network. Let me know the relevant efforts you are aware of. Let me know which events you think bring together people likely to want to be involved.

I look forward to finding out about you!

Open Government Data Camp in Warsaw 2011. An example of the vibrancy of the European open data network, I called it the community’s ‘family christmas party’, at the time. Above the schedule of sessions created collectively by the participants, with many local initiatives and examples shared with the EU wide network. Below one of those sessions, on local policy making and open data.

Bookmarked Small Data? No Problem! Exploring the Viability of Pretrained Multilingual Language Models for Low-resourced Languages by Kelechi Ogueji, Yuxin Zhu, Jimmy Lin, 2021

LLMs usually require loads of training data, the bigger the better. This biases such training, as Maggie Appleton also pointed out, to western and English dominated resources. This paper describes creating a model for a group of 11 African languages that are underresourced online, and as a result don’t figure significantly in the large models going around (4 of the 11 have never been included in a LLM before). All the material is available on GitHub. They conclude that training a LLM with such lower resourced languages with the larger ones is less effective than taking a grouping of underresourced languages together. Less than 1GB of text can provide a competitive model! That sounds highly interesting for the stated reason: it allows models to be created for underresourced languages at relatively little effort. I think that is a fantastic purpose because it may assist in keeping a wide variety of languages more relevant and bucking the trend towards cultural centralisation (look at me writing here in English for a case in point). It also makes me wonder about a different group of use cases: where you have texts in a language that is well enough represented in the mainstream LLMs, but where the corpus you are specifically or only interested in is much smaller, below that 1GB threshold. For instance all your own written output over the course of your life, or for certain specific civic tech applications.

We show that it is possible to train competitive multilingual language models on less than 1 GB of text. .our model … is very competitive overall. … Results suggest that our “small data” approach based on similar languages may sometimes work better than joint training on large datasets with high-resource languages.

Ogueji et al, 2021

John Caswell writes about the role of conversation, saying "conversation is an art form we’re mostly pretty rubbish at". New tools that employ LLM’s, such as GPT-3 can only be used by those learning to prompt them effectively. Essentially we’re learning to have a conversation with LLMs so that its outputs are usable for the prompter. (As I’m writing this my feedreader updates to show a follow-up post about prompting by John.)

Last August I wrote about articles by Henrik Olaf Karlsson and Matt Webb that discuss prompting as a skill with newly increasing importance.

Prompting to get a certain type of output instrumentalises a conversation partner, which is fine for using LLM’s, but not for conversations with people. In human conversation the prompting is less to ensure output that is useful to the prompter but to assist the other to express themselves as best as they can (meaning usefulness will be a guaranteed side effect if you are interested in your conversational counterparts). In human conversation the other is another conscious actor in the same social system (the conversation) as you are.

John takes the need for us to learn to better prompt LLM’s and asks whether we’ll also learn how to better prompt conversations with other people. That would be great. Many conversations take the form of the listener listening less to the content of what others say and more listening for the right time to jump in with what they themselves want to say. Broadcast driven versus curiosity driven. Me and you, we all do this. Getting consciously better at avoiding that common pattern is a win for all.

In parallel Donald Clark wrote that the race to innovate services on top of LLM’s is on, spurred by OpenAI’s public release of Chat-GPT in November. The race is indeed on, although I wonder whether those getting in the race all have an actual sense of what they’re racing and are racing towards. The generic use of LLM’s currently in the eye of public discussion I think might be less promising than gearing it towards specific contexts. Back in August I mentioned Elicit that helps you kick-off literature search based on a research question for instance. And other niche applications are sure to be interesting too.

The generic models are definitely capable to hallucinate in ways that reinforce our tendency towards anthropomorphism (which needs little reinforcement already). Very very ELIZA. Even if on occasion it creeps you out when Bing’s implementation of GPT declares its love for you and starts suggesting you don’t really love your life partner.

I associated what Karlsson wrote with the way one can interact with one’s personal knowledge management system the way Luhmann described his note cards as a communication partner. Luhmann talks about the value of being surprised by whatever person or system you’re communicating with. (The anthropomorphism kicks in if we based on that surprisal then ascribe intention to the system we’re communicating with).

Being good at prompting is relevant in my work where change in complex environments is often the focus. Getting better at prompting machines may lift all boats.

I wonder if as part of the race that Donald Clark mentions, we will see LLM’s applied as personal tools. Where I feed a more open LLM like BLOOM my blog archive and my notes, running it as a personal instance (for which the full BLOOM model is too big, I know), and then use it to have conversations with myself. Prompting that system to have exchanges about the things I previously wrote down in my own words. With results that phrase things in my own idiom and style. Now that would be very interesting to experiment with. What valuable results and insight progression would it yield? Can I have a salon with myself and my system and/or with perhaps a few others and their systems? What pathways into the uncanny valley will it open up? For instance, is there a way to radicalise (like social media can) yourself by the feedback loops of association between your various notes, notions and follow-up questions/prompts?

An image generate with Stable Diffusion with the prompt “A group of fashionable people having a conversation over coffee in a salon, in the style of an oil on canvas painting”, public domain

In the noisy chaotic phase that Twitter Inc. is going through, I downloaded my data from them 2 weeks ago. Meanwhile in the Fediverse newcomers mention they appreciate how nice, pleasant and conversational things are.

It’s good to note that that is how Twitter started out too. In my network I felt I was late joining Twitter, this because I was using Jaiku (a similar, better I might add, service based in Europe). Sixteen years on that can be seen as early user. My user ID is number 59923, registered on Tuesday December 12th, 2006. Judging by the time, 10:36am, I registered during my regular 10:30 coffee break.

One minute later I posted my first message. It had ID 994313, so my Tweet was just within the first million messages on Twitter (the current rate seems to be over 800 million Tweets per day!). That first message mentioned the tool I was going to benchmark Twitter against: Jaiku.

What followed that first message was like how it was the past 4 years using Mastodon. A bunch of gentle conversations.

Back then everyone was nice, as you tend to be in public e.g. walking through a small village. Over time Twitter conversations tended towards “I need to win this exchange, even if I agree with my counterpart”. Argumentative. Performance above conversation. Performing in front of your own followers by enacting a conversation with someone else. The general tone of voice on Twitter (apart from the actual toxicity) is somewhat like the difference of posture you take in a metropolis versus a village. In a village you greet passersby, project an aura of approachability etc. In an urban environment you tend to pretend to not see others, are pro-active in claiming your physical space, alert that others don’t push you aside or further down the queue etc. Urban behaviour easily looks aggressive, and at the very least unnecessarily rude, in a village.

The past few weeks saw a massive influx of people from Twitter. Which is good. I also noticed that it felt a bit like city folk descending on some backwater. The general tone of voice, directness or terseness in phrasing, reflecting the character limit on Twitter, in contrast with the wider limits in Mastodon-village which allows both for more nuance and for, yes, politeness.
The contrast was felt both ways, as newcomers commented on how nice the conversations were, a breath of fresh air etc.

Quantitative changes, like a rising number of people using a specific communication channel, leads to qualitative changes. It did on Twitter. It will on Mastodon, despite the differences. In the fediverse some of that effect will be buffered by the tools individual users have on hand (blocking, blocking instances, moving instance or run your own, participate from your own website, e.g.). Meaning one can choose to ‘live’ in the middle of the metropolis, or on its outskirts where not many much frequent. But the effect will be there, also because there will be more tools built from other starting principles than the current tree of fediverse applications on top of the underlying ActivityPub protocol. Some will be counter those that underpin e.g. Mastodon, others will be aligned. But change it will.

It’s nice out here, but do regularly check the back of the package for the best-by date.