Aaron Swartz would have turned 32 November 8th. He died five years and 10 months ago, and since then, like this weekend, the annual Aaron Swartz weekend takes place with all kinds of hackathons and events in his memory. At the time of his suicide Swartz was being prosecuted for downloading material in bulk from JSTOR, a scientific papers archive (even though he had legitimate access to it).

In 2014 the Smart New World exhibition took place in Kunsthalle Düsseldorf, which Elmine and I visited. Part of it was the installation “18.591 Articles Sold By JSTOR for $19 = $353.229” with those 18.591 articles printed out, showing what precisely is behind the paywall, and what Swartz was downloading. Articles, like those shown, from the 19th century, since long in the public domain, sold for $19 each. After Swartz’ death JSTOR started making a small percentage of their public domain content freely accessible, limited to a handful papers per month.

The Düsseldorf exhibit was impressive, as it showed the volumes of material, but the triviality of most material too. It’s a long tail of documents with extremely low demand, being treated equally as recent papers in high demand.

Smart New World

Smart New World Smart New World
Smart New World Smart New World
Smart New World

Scientific journal publishers are increasingly a burden on the scientific world, rent-seeking gatekeepers. Their original value added role, that of multiplication and distribution to increase access, has been completely eroded, if not actually fully reversed.

Something that strikes me as odd in addressing fake news, is that it’s almost exclusively focused on the information production and distribution. Not on the skills and strategies of the entity taking information in. Partly this is understandable, as forcing transparency on how your information might have been influenced is helpful (especially to see if what you get presented with is something others / everyone else is presented with). But otherwise it’s as if those receiving information are treated as passive consumers, not as agents in their own right.

“Our best defense against hostile influence, whatever its vector, is to invest in critical thinking skills at all levels of the population so that outlandish claims are seen for what they truly are: emotional exploitation for political or monetary gain”, wrote Nina Jankowicz on how Finnish society instills critical thinking skills.

The question of course is whether governments truly want to inoculate society, or merely want to deflect disinformation and manipulation from specific sources. Then it’s easier to understand where the focus on technology oriented solutions, or ones that presume centralised efforts come from.

In networks smartness needs to be at the endpoints, not in the core. There’s a lack of attention for the information strategies, filtering and interpreting tactics of those receiving information. Crap detection skills need to be developed for instance, and societies have a duty to self-inoculate. I think the obligation to explain* applies here too, showing others what you do and how.

Here’s a list of postings about my information habits. They’re not fixed, and currently I’m in the process of describing them again, and taking a critical look at them. What are your information habits, have you ever put them into words?

*The obligation to explain is something I’ve adopted from my friend Peter Rukavina: “The benefits of a rich, open pool of knowledge are so great that those who have learned have an obligation to share what they’ve learned.

Some links I thought worth reading the past few days

Last weekend during the Berlin IndieWeb Camp, Aaron Parecki gave a brief overview of where he/we is/are concerning the ‘social reader’. This is of interest to me because since ever I have been reading RSS, I’m doing by hand what he described doing more automatically.

These are some notes I made watching the live stream of the event.

Compared to the algorithmic timelines of FB, Twitter and Instagram, that show you what they decide to show you, the Social Reader is about taking control: follow the things you want, in the order that you want.
RSS readers were and are like that. But RSS reading never went past linear reading of all the posts from all your feeds in reverse chronological order. No playing around with how these feeds are presented to you. And no possibility to from within the reader take actions based on the things you read (sharing, posting, bookmarking, flagging, storing etc.): there are not action buttons on your feedreader, other than mark as unread or archive.

In the IndieWeb world, publishing works well Aaron said, but reading has been an issue (at least if it goes beyond reading a blog and commenting).
That’s why he built Monocle, and Aperture. Aperture takes all kinds of feeds, RSS, JSON, Twitter, and even scripts pushing material to it. These are grouped in channels. Monocle is a reader on top of that, where he presents those channels in a nice way. Then he added action buttons to it. Like reply etc. Those actions you initiate directly in the reader, and always post to your own site. The other already existing IndieWeb building blocks then send it to the original source of the item you’re responding to. See Aaron’s posting from last March with screenshots “Building an IndieWeb Reader“, to get a feeling for how it all looks in practice.

The power of this set-up is that it separates the layers, of how you collect material, how work on that material, and how you present content. It looked great when Aaron demo’d it when I met him at IWC Nürnberg two weeks earlier.

For me, part of the actions I’d like to take are definitely outside the scope of my own website, or at the very least outside the public part of my website. See what I wrote about my ideal feed reader. Part of automation of actions I’d want to point to different workflows on my own laptop for instance. To feed into desk research, material for client updates, and things like that.

I’m interested in running things like Aperture and Monocle locally, but a first step is exploring them in the way Aaron provides them to test drive. Aperture works fine. But I can’t yet get Monocle to work for me. This is I guess the same issue I ran into two weeks ago with how my site doesn’t support sending authorisation headers.

This is a start to more fully describe and explore a distributed version of digitisation, digitalisation and specifically digital transformation, and state why I think bringing distributed / networked thinking into them matters.

Digitising stuff, digitalising routines, the regular way

Over the past decades much more of the things around us became digitised, and in recent years much of the things we do, our daily routines and work processes, have become digitalised. Many of those digitalised processes are merely digitised replicas of their paper predecessors. Asking for a government permit for instance, or online banking. There’s nothing there that wasn’t there in the paper version. Sometimes even small steps in those processes still force you to use paper. At the start of this year I had to apply for a declaration that my company had never been involved in procurement fraud. All the forms I needed for it (30 pages in total!), were digitised and I filled them out online, but when it came to sending it in, I had to print the PDF resulting from those 30 pages, and send it through snail mail. I have no doubt that the receiving government office’s first step was to scan it all before processing it. Online banking similarly is just a digitised paper process. Why don’t all online bank accounts provide nifty visualisation, filtering and financial planning tools (like alerts for dates due, saving towards a goal, maintaining a buffer etc.), now that everything is digital? The reason we laugh at Little Britains ‘computer says no’ sketches, is because we recognise all too well the frustration of organisations blindly trusting their digitalised processes, and never acknowledging or addressing their crappy implementation, or the extra work and route-arounds their indifference inflicts.

Digital transformation, digital societies

Digital transformation is the accumulated societal impact of all those digital artefacts and digitalised processes, even if they’re incomplete or half-baked. Digital transformation is why I have access to all those books in the long tail that never reached the shelves of any of the book shops I visited in decades part, yet now come to my e-reader instantly, resulting in me reading more and across a wider spectrum than ever before. Digital transformation is also the impact on elections that almost individually targeted data-driven Facebook advertising caused by minutely profiling undecided voters.

Digital transformation is often referred to these days, in my work often also in the context of development and the sustainable development goals.
Yet, it often feels to me that for most intents and purposes this digital transformation is done to us, about us but not of us. It’s a bit like the smart city visions corporations like Siemens and Samsung push(ed), that were basically devoid of life and humanity. Quality of life reduced and equated to security only, in sterilised cities, ignoring that people are the key actors, as critiqued by Adam Greenfield in 2013.

Human digital networks: distributed digital transformation

The Internet is a marvellous thing. At least it is when we use it actively, to assist us in our routines and in our efforts to change, learn and reach out. As social animals, our human interaction has always been networked where we fluently switch between contexts, degrees of trust and disclosure, and routing around undesired connections. In that sense human interaction and the internet’s original design principle closely match up, they’re both distributed. In contrast most digitalisation and digital transformation happens from the perspective of organisations and silos. Centralised things, where some decide for the many.

To escape that ‘done to us, about us, not of us’, I think we need to approach digitisation, digitalisation and digital transformation from a distributed perspective, matching up our own inherently networked humanity with our newly (since 30 yrs) networked global digital infrastructure. We need to think in terms of distributed digital transformation. Distributed digital transformation (making our own digital societal impact), building on distributed digitisation (making our things digital), and on distributed digitalisation (making our routines digital).

Signs of distributed digitisation and digitalisation

Distributed digitisation can already be seen in things like the quantified self movement, where individuals create data around themselves to use for themselves. Or in the sensors I have in the garden. Those garden measurements are part of something you can call distributed digitalisation, where a network of similar sensors create a map of our city that informs climate adaptation efforts by local government. My evolving information strategies, with a few automated parts, and the interplay of different protocols and self-proposed standards that make up the Indieweb also are examples of distributed digitalisation. My Networked Agency framework, where small groups of relationships fix something of value with low threshold digital technology, and network/digital based methods and processes, is distributed digitisation and distributed digitalisation combined into a design aid for group action.

Distributed digital transformation needs a macroscope for the new civil society

Distributed digital transformation, distributed societal impact seems a bit more elusive though.
Civil society is increasingly distributed too, that to me is clear. New coops, p2p groups, networks of individual actors emerge all over the world. However they are largely invisible to for instance the classic interaction between government and the incumbent civil society, and usually cut-off from the scaffolding and support structures that ‘classic’ activities can build on to get started. Because they’re not organised ‘the right way’, not clearly representative of a larger whole. Bootstrapping is their only path. As a result these initiatives are only perceived as single elements, and the scale they actually (can) achieve as a network remains invisible. Often even in the eyes of those single elements themselves.

Our societies, including the nodes that make up the network of this new type of civil society, lack the perception to recognise the ‘invisible hand of networks’. A few years ago already I discussed with a few people, directors of entities in that new civil society fabric, how it is that we can’t seem to make our newly arranged collective voices heard, our collective efforts and results seen, and our collective power of agency recognised and sought out for collaboration? We’re too used, it seems, to aggregating all those things, collapsing them into a single voice of a mouthpiece that has the weight of numbers behind it, in order to be heard. We need to learn to see the cumulative impact of a multitude of efforts, while simultaneously keeping all those efforts visible on their own. There exist so many initiatives I think that are great examples of how distributed digitalisation leads to transformation, but they are largely invisible outside their own context, and also not widely networked and connected enough to reach their own full potential. They are valuable on their own, but would be even more valuable to themselves and others when federated, but the federation part is mostly missing.
We need to find a better way to see the big picture, while also seeing all pixels it consists of. A macroscope, a distributed digital transformation macroscope.

Does the New York Times see the irony? This article talks about how US Congress should look much less at the privacy terms of big tech, and more at the actual business practices.

Yet it calls upon me to disable my ad blocker. The ad blocker that blocks 28 ads in a single article, all served by a Google advertisement tracker. One which one of my browsers flags as working the same way as cross site scripting attacks work.

If as you say adverts are at the core of your business model, making journalism possible, why do you outsource it?
I’m ok with advertising New York Times, but not with adtech. There’s a marked difference between the two. It’s adtech, not advertising, that does the things you write about, like “how companies can use our data to invisibly shunt us in directions” that don’t benefit us. And adtech is the reason that, as you the say, the “problem is unfettered data exploitation and its potential deleterious consequences.” I’m ok with a newspaper running their own ads. I’m not ok with the New York Times behaving like a Trojan horse, pretending to be a newspaper but actually being a vehicle for, your own words, the “surveillance economy”.

Until then my ad blocker stays.


My browser blocking 28 ads (see the address bar) on a single article, all from 1 Google ad tracker.

Last week I presented to a provincial procurement team about how to better support open data efforts. Below is what I presented and discussed.

Open data as policy instrument and the legal framework demands better procurement

Publishing open data creates new activity. It does so in two ways. It allows existing stakeholders to do more themselves or do things differently. It also allows people who could not participate before become active as well. We’ve seen for instance how opening up provincial and national geographic data increases the independent usage of that data by local governments. We’ve also seen how for instance the Dutch hiking association started using national geographic data to create and better document routes. To the surprise of the Cadastre a whole new area of usage appeared as well, by cultural organisations who before had never requested such data. So open data is an enabler for agency.

If as a government data holder you know this effect takes place, you can also try and achieve it deliberately. For policy domains and groups of stakeholders where you would like to see more activity, publishing data then is an instrument in for instance achieving your own policy goals. Next to regulation and financing, publishing open data is a new third policy instrument. It also happens to be the cheapest of those three to deploy.

Open data in the EU has a legal framework where over time more things are mandated. There is a right to re-use. Upon request dataholders must be able to provide machine readable data formats. In the Netherlands open standards are compulsory for government entities since 2008. Exclusive access to government data for re-use is, except for a few very strictly regulated situations, illegal.

To be able to comply with the legal framework, and to be able to actively use open data as a policy instrument, public sector bodies must pay more attention to how they acquire data, and as a consequence must pay more attention to what happens during procurement processes. If you don’t the government entity’s data sovereignty is strongly diminished, which carries costs.

Procurement awareness needed on multiple levels

The goal is to ensure full data sovereignty. This means paying real attention to various things on different levels of abstraction around procurement.

  • Ensuring data is received in open standards and regular domain specific standards
  • Ensure when reports are received that the data used, such as for graphs and tables, are also received
  • Ensure when information products are received (maps, visualisations) the data used for them are also received
  • Ensure procurement and collaboration contracts do not preclude sharing data with third parties, apart from on grounds already mentioned as exceptions in the law on freedom of information and re-use
  • Ensure that when raw data is provided to service providers, that data is still available to the government entity
  • Ensure that when data is collected by external entities who in turn outsource the collection, all parties involved know the data falls under the decision making power of the government entity
  • Ensure in collaborations you do not sign away decision power over the data you contribute, you have rights to the data you collectively create, and have as little restriction as possible on the data others contribute.

What could go wrong?

Unless you always pay attention to these points, you run the risk of losing your data sovereignty. This can lead to situations where a government entity is no longer able to comply with its own legal obligations concerning data provision and transparency.

A few existing examples from what can go wrong.

  • A province is counting bicycle traffic through a network of sensors they deployed themselves. The data is directly transmitted to a service provider in a different country. The province can see dashboards and download reports, but has no access to the sensor data itself, and cannot download the sensor data. While any citizen requesting the data could not be provided with that data, the service provider itself does base commercial services on that and other data it receives, having de facto exclusive access to it.
  • Another province is outsourcing bird inventory counting to nature preservation organisations, who in turn rely on volunteers to do the bird watching. The province pays for the effort. When it comes to sharing the data publicly, the nature preservation organisations say their volunteers actually own the data, so nothing can be publicly shared. This is untrue for multiple reasons (database rights do not apply, it is a paid for effort so procurement terms that unequivocally transfer such rights should they exist to the province etc), but as the province doesn’t want to waste time on this, nor wants to get into a fight, it leaves it be, resulting in the data not being made available.
  • An energy network provider pools a lot of different data sources concerning energy usage in their service area from a network of collaborating entities, both private and public. They also publish a lot of open data already. As part of the national effort towards energy transition they receive many data requests from local governments, housing associations and other entities. They would like to provide data, as they see it as a way of contributing to an essential public task (energy transition), but still say no to data requests in 60% of all cases. Because they can’t figure out which contractual obligations apply to which parts of the data, or cannot reconcile conflicting or ambiguous contract clauses concerning the data.
  • All provinces pool data concerning economic activity and the labor market in a private foundation in which also private entities participate. That foundation sells data subscriptions. Currently they also publish some open data, but if any of the provinces would like to do more, they would have to wait for full agreement. The slowest in the group would determine the actual level of transparency.
  • A province has outsourced the creation of a ‘heat transition atlas’, in which the potential for moving away from natural gas burning heating systems in homes using various alternatives is mapped. The resulting interactive website contains different data layers, but those data layers are themselves unavailable. Although there is a general list of which data sources have been used, it is not precisely stating its sources and not providing details on how the data has been transformed for the website.

In all cases the public sector data holder has put itself in a position that could have been prevented had they paid more attention at the time of procurement or at the time of entering into collaboration. All these situations can be fixed later on, but they require additional effort, time and costs to arrange, which are unnecessary if dealt with during procurement.

But we have procurement regulations already!

What about procurement regulations. We have those, so don’t they cover all this? Mostly not it turns out.

Terms of procurement talk about rights transfer of all deliverables, but in many cases the data involved isn’t listed as a deliverable, so not covered by those terms.
The terms talk about transfer of database rights, but those hardly ever apply as usually the scale of data collection and structuring into a database is limited.
Concerning research there is some talk about also transferring the data concerned, but a lot of reports aren’t research but consultancy services.

In the general regulations that apply to provincial procurement, the word data only is used in the context of personal data protection, as the dutch plural for date, and in the context of data carriers (hard drives etc). The word standards never occurs, nor does it contain references to data formats (even though legal obligations exist for government entities concerning standards and data formats)

The procurement terms are neither broad enough, nor detailed enough.

How to improve the situation

So what needs to be arranged to ensure government entities arrange their data needs correctly during procurement? How to plug the holes? A few things at the very least:

  • Likely, when it comes to standards and formats (which may differ per domain), the only viable place is in the mandatory technical requirements in a call for tender / request for proposals.
  • To get the data behind graphs, tables, info products and reports, including a list of resources and transformations applied, it needs to be specified in the list of deliverables.
  • Collaboration contracts entered into should always have articles on sharing the data you contribute, being able to share the data resulting from the collaboration, and rules about data that others contribute.

It is important to realise that you cannot through contracts do away with any mandatory transparency, open data, or data governance aspects. Any resulting issues will mean time consuming and likely costly repair activities.

Who needs to be involved

In order to prevent the costs of repair or mitigation of consequences, there are a number of questions concerning who should be doing what, inside a government entity.

  • What needs to be arranged at the point of tender, who will check it?
  • What needs to be part of all project starts (e.g. Checklists, data paragraphs), is the project manager aware of this, and who will check it?
  • Who at the writing and signing of any contract will check data aspects?
  • Who at the time of delivery will check if data requirements are met?
  • What part of this is more about awareness and operatios, what needs to be done through regulation?

Our work in the next steps

We intend to assist the province involved in making sure procurement better enables data sharing from now on. Steps we are currently taking to move this forward are:

  • We’ve put data sovereignty into the organisations strategy document, and tied it into overall data governance improvement.
  • With the information management department we’ll visit all main procurers to discuss and propose actions
  • We’ll likely build one or more checklists for different aspects
  • We’ll work with a 3 person team from the procurement department to more deeply embed data awareness and amend procurement processes

All this is basically a preventative step to ensure the province has its house in order concerning data.

From the recent posting on Mastodon and it currently lacking a long tail, I want to highlight a specific notion, and that’s why I am posting it here separately. This is the notion that tool usage having a long tail is a measure of distribution, and as such a proxy for networked agency. [A long tail is defined as the bottom 80% of certain things making up over 50% of a ‘market’. The 80% least sold books in the world make up more than 50% of total book sales. The 80% smallest Mastodon instances on the other hand account for less than 15% of all Mastodon users, so it’s not a long tail].

To me being able to deploy and control your own tools (both technology and methods), as a small group of connected individuals, is a source of agency, of empowerment. I call this Networked Agency, as opposed to individual agency. Networked also means that running your own tool is useful in itself, and even more useful when connected to other instances of the same tool. It is useful for me to have this blog even if I am its only reader, but my blog is even more useful to me because it creates conversations with other bloggers, it creates relationships. That ‘more useful when connected’ is why distributed technology is important. It allows you to do your own thing while being connected to the wider world, but you’re not dependent on that wider world to be able to do your own thing.

Whether a technology or method supports a distributed mode, in other words is an important feature to look for when deciding to use it or not. Another aspect is the threshold to adoption of such a tool. If it is too high, it is unlikely that people will use it, and the actual distribution will be very low, even if in theory the tools support it. Looking at the distribution of usage of a tool is then a good measure of success of a tool. Are more people using it individually or in small groups, or are more people using it in a centralised way? That is what a long tail describes: at least 50% of usage takes place in the 80% of smallest occurrences.

In June I spoke at State of the Net in Trieste, where I talked about Networked Agency. One of the issues raised there in response was about scale, as in “what you propose will never scale”. I interpreted that as a ‘centralist’ remark, and not a ‘distributed’ view, as it implied somebody specific would do the scaling. In response I wrote about the ‘invisible hand of networks‘:

“Every node in a network is a scaler, by doing something because it is of value to themselves in the moment, changes them, and by extension adding themselves to the growing number of nodes doing it. Some nodes may take a stronger interest in spreading something, convincing others to adopt something, but that’s about it. You might say the source of scaling is the invisible hand of networks.”

In part it is a pun on the ‘invisible hand of markets’, but it is also a bit of hand waving, as I don’t actually had precise notions of how that would need to work at the time of writing. Thinking about the long tail that is missing in Mastodon, and thus Mastodon not yet building the distributed social networking experience that Mastodon is intended for, allows me to make the ‘invisible hand of networks’ a bit more visible I think.

If we want to see distributed tools get more traction, that really should not come from a central entity doing the scaling. It will create counter-productive effects. Most of the Mastodon promotion comes from the first few moderators that as a consequence now run large de-facto centralised services, where 77% of all participants are housed on 0,7% (25 of over 3400) of servers. In networks smartness needs to be at the edges goes the adagium, and that means that promoting adoption needs to come from those edges, not the core, to extend the edges, to expand the frontier. In the case of Mastodon that means the outreach needs to come from the smallest instances towards their immediate environment.

Long tail forming as an adoption pattern is a good way then to see if broad distribution is being achieved.
Likely elements in promoting from the edge, that form the ‘invisible hand of networks’ doing the scaling are I suspect:

  • Show and tell, how one instance of tool has value to you, how connected instances have more value
  • Being able to explain core concepts (distribution, federation, agency) in contextually meaningful ways
  • Being able to explain how you can discover others using the same tool, that you might want to connect to
  • Lower thresholds of adoption (technically, financially, socially, intellectually)
  • Reach out to groups and people close to you (geographically, socially, intellectually), that you think would derive value from adoption. Your contextual knowledge is key to adoption.
  • Help those you reach out to set up their own tools, or if that is still too hard, ‘take them in’ and allow them the use of your own tools (so they at least can experience if it has value to them, building motivation to do it themselves)
  • Document and share all you do. In Bruce Sterling’s words: it’s not experimenting if you’re not publishing about it.

stm18
An adoption-inducing setting: Frank Meeuwsen explaining his steps in leaving online silos like Facebook, Twitter, and doing more on the open web. In our living room, during my wife’s birthday party.

Today the 2.6 version of Mastodon has been released. It now has built-in support for “rel=me”, which allows verification. Meaning that I can show on my Mastodon profile a link to my site and can proof that both are under my control.

Rel=me is something you add to a link on your own site, to indicate that the page or site you link to also belongs to you. The page you link to needs to link back to your site and make it reciprocal. This is machine readable, and allows others to establish that different pages are under control of the same person or entity.

On my own site I use ‘rel=me’ in the about section in the right hand column. First, if you check the html source of my page, you’ll see that I say that this site (zylstra.org/blog) is my primary site, by making it the only link that has a ‘u-uid’ class (uid is unique id). It also has rel=”me”, meaning the relationship I have with the linked site, is that it is me: 

class="u-url url u-uid uid" rel="me" href='https://www.zylstra.org/blog'

Further down in that About segment you find other links, to my Mastodon and Twitter profiles. If you look at those links you will see it says:

rel=”me” href=’https://m.tzyl.nl/@ton'

saying my Mastodon profile is also me, and similarly to say that a specific Twitter profile is also me (I maintain other Twitter profiles as well but they’re not me, but my company etc.):

rel=”me” href=’https://twitter.com/ton_zylstra'

To close the loop that allows verification that that is true, both my Mastodon profile and my Twitter profile need to link back to my site in a way that machines can check. For Twitter that is easiest: it has a specific place in a user profile for a website address. Like in the image below.

In Mastodon I can add multiple URLs to my profile but there was no way for me to explicitly say in my Mastodon profile that a specific link is the one that represents my online identity. But now I can add a rel=me link in my Mastodon profile, so that both my website and my Mastodon profile link to each other in a verifiable way, proving both are under my control. As you can see in the image below it was available on a single instance already for testing purposes (the green mark signifies verification with the linked site), and now with today’s release it is available to all Mastodon instances.

So how is verification of control over different pages by the same person useful? It may be useful to show that another Twitter profile with my name is not me, because there’s no two-way link between that profile and my site. If you have multiple rel=me references it becomes harder for others to fake specific parts of your online identity. Further, it allows additional functionality like logging in on a different site using credentials from another site you control. It also makes it possible to map networks, and discovery. Site X links to profile X with rel=me on Twitter. There X follows Y, and Y’s profile says, site Y is under her control. Now I know that Site X and Site Y’s authors are somehow connected. If I’m following site X, I may find it interesting to also regularly read site Y.

As soon as my Mastodon instance has been updated to the latest version, which will likely be sometime today, I will add the rel=me in my Mastodon profile, making the link between this site and that profile verifiable.

[UPDATE] It now works on my Mastodon instance:

[TL;DR: A long tail is needed for distributed technology to be sustainable I think, otherwise it’s just centralisation and single points of failure in a different form. A long tail means the bottom 80% take over 50% of a market, and the top 20% under 50%. Mastodon currently has over 85% of its participants in the top 20% of instances, and it’s worse than that as 77% of participants are in 0,7% of instances. Just 15% are in the bottom 80% of instances. There’s a power law distribution, but it’s not a long tail. What can Mastodon do to get there and to sustainability?]

On 6 October 2016 Mastodon was launched, and its originator Eugen Rochko looks back in a blogpost on the journey of the past two years.

I joined on 7 April 2017, 6 months after its launch, at the Mastodon.cloud instance. I posted some messages for a month, then fell quiet for half a year. A few messages last March, and then I started using it more frequently last month, in the run-up to figuring out how to run Mastodon for myself (which for now means a hosted solution, but still aiming for running it from the home router). It’s now part of my daily information diet, but no guarantee yet it will last, although being certain I have ‘my half’ of the conversation on a domain I own helps a lot towards maintaining worthwhile exchanges.

Eugen’s blogpost is rightfully proud of what has been accomplished. It’s not yet proof of the sustainability of federated solutions though as he suggests.

He shares a few interesting numbers about the usage of Mastodon. The median of the 3460 known instances is 8 users. In total there are 1.627.557 registered accounts. The largest instance has 415.941 members, while the top 3 together have 52% of users, meaning the number 2 and 3 average 215.194 accounts. The top 25 largest instances have 77% or 1.253.219 members, meaning that the numbers 4-25 average 18.495 users. As the median is 8 it means the smallest 1730 instances have at most 8*1730 = 13.840 users. It also means that the number 26 to number 1730 instances have at least 360.498 members, or an average of 211. This tells us there’s a Pareto power law distribution: the top 20% of instances hold at least 85% of users at the moment. That also means there is no long tail, just a stub that holds at most 15% of Mastodon users only. For a long tail to exist, the smallest 80% of instances should account for over 50% of users, or over three times more than the current number.

As the purpose of Mastodon is distribution, where federation allows everyone to connect regardless of their instances (sort of like e-mail), I think Mastodon can only be deemed sustainable if there is a true long tail. Meaning, that while the number of users goes up, the number of instances should go up at a faster rate. So that over 50% of all Mastodon users will be on the 80% smallest or even individual instances. In the current numbers we should be most interested in the 50% of instances that now have 8 or less users, and find out what drives those instances, so we may have many many more of them. We should also think about what a bigger-to-smaller-instances funnel for members can look like, not just leave it to chance. I think that the top 25 Mastodon instances, which is just 0.7% of the total, currently having 77% of all users is very problematic from a sustainability perspective. Because that level of concentration is completely at odds with the stated purpose of Mastodon: distribution.

Eugen Rochko in his anniversary posting points at a critical article from April 2017 in Mashable, implying that criticaster has been been proven wrong definitively. I disagree. While much of the ‘predictions’ in that article are indeed silly, it also contains a few hints as to where sustainability may be found. The criticaster doesn’t get federation (yet likely uses mail everyday), and complains about discovery (yet likely is relieved not all his personal e-mail addresses are to be found in Google). Yet if we can’t explain distribution and federation, and can’t or don’t communicatie how discovery works in such a setting then we won’t be able to make a long tail grow. For more people to adopt small or individual instance we need to bring the threshold for running your own instance way down, and then way down again. To the level of at most one click installing a script on any regular hosting service, and creating a first account.

Using open protocols, like ActivityPub which Mastodon supports, is key in getting more people out of walled gardens and silos, and on the open web. Tracking its adoption is a useful measure of success, but 2 years of existence is not a sign of sustainability at all. What Eugen Rochko has kicked off with Mastodon is valuable and very laudable, but we have barely started getting to where we need to be for it to stick.

Some links I thought worth reading the past few days

  • On how blockchain attempts to create fake scarcity in the digital realm. And why banks etc therefore are all over it: On scarcity and the blockchain by Jaap-Henk Hoepman
  • Doc Searl’s has consistently good blogposts about the adtech business, and how it is detrimental to publishers and citizens alike. In this blogpost he sees hope for publishing. His lists on adverts and ad tech I think should be on all our minds: Is this a turning point for publishing?
  • Doc Searl’s wrote this one in 2017: How to plug the publishing revenue drain – The Graph – Medium
  • In my information routines offline figures prominently, but it usually doesn’t in my tools. There is a movement to put offline front and center as design principle it turns out: Designing Offline-First Web Apps
  • Hoodie is a backendless tool for building webapps, with a offline first starting point: hood.ie intro
  • A Berlin based company putting offline first as foremost design principle: Neighbourhoodie – Offline First
  • And then there are Service Workers, about which Jeremy Keith has just published a book: Going Offline
  • Haven’t tested it yet, but this type of glue we need much more of, to reduce the cost of leaving silos, and to allow people to walk several walled gardens at the same time as a precursor to that: Granary

Just quickly jotting some thoughts down about bookmarking, as part of a more general effort of creating an accurate current overview of my information strategies.

Currently I store all my bookmarks in Evernote, by storing the full article or pdf (not just the url, removing the risk of it being unavailable later, or behind a paywall). I sometimes add a brief annotation at the start, and may add one or more tags.

I store bookmarks to Evernote from my browser on the laptop, but also frequently from my mobile, where I pick them out of various timelines.
There are several reasons I store bookmarks.

  • I store predictions people make, to be able to revisit them later, and check on whether they came true or not.
  • I store news paper articles to preserve how certain events were depicted at the time they happened (without the historic reinterpretation that usually follows later)
  • I store pages for later reading (replacing Instapaper)
  • I store bookmarks for sharing in (collated) blogposts, or on Twitter, or to send to a specific person (‘hey, this looks like what you were looking for last week’)
  • I store bookmarks around topics I am currently interested in, as resource for later or current desk research, or for a current project.
  • I store bookmarks as reminders (‘maybe this restaurant is a place to go to sometime when next in Berlin’, ‘possible family trip’, ‘possible interesting conference to attend’)

In the past, when I still used Delicious, when it had a social networking function, I also used bookmarking for discovery of other people. Because social tools work in triangles (as I said in 2006) I would check in Delicious who else had also bookmarked something, and with which tags they did so. The larger the difference in tags (e.g. I’d tag ‘knowledge management’ and they’d tag ‘medication’) or difference in jargon (me ‘complexity’, they ‘wicked_problem’, another ‘intractable’), the likelier someone would be part of different communities than me, but focusing on the same things. Then I’d seek out their blog etc, and start following their rss feeds. It was a good way to find people based on professional interests and extend my informal learning network. A way to diversify my inputs for various topics.

Visualization of my del.icio.us bookmarks
A visualisation of Kars Alfrink’s Delicious bookmarks, based on usage of tags, 2006, CC-BY

Looking at that list of uses, I notice that it is a mixture of things that can be public, things that can be public to some, and things that are just for my eyes. I also know that I don’t like publishing single bookmarks to my blog, unless I have an extended annotation to publish with it (more a reflection or response to a link, than just bookmarking that link). Single bookmarks posted to a blog I experience as cluttering up the timeline (but they could be on a different page perhaps).
The tagging is key as a filing mechanism, and annotation can be a helpful hint to my future self why I stored it, as much as a thought or an association.

When I think of ‘bringing bookmarking home’ in the sense of using only non-silo tools and owning the data myself, several aspects are important:

  • The elements I need to store: URL, date/time stored, full article/pdf, title, tags, notes. Having a full local copy of a page or PDF is a must-have for me, you can’t rely on something being there the next time you look at an URL.
  • The things I want to be able to do with it are mostly a filtering on tags I think (connecting it to one or more persons, interests, projects, channels etc.), and then having different actions/processes tied to that filtering.
  • I’d want to have the bookmarks available offline on my laptop, as well as available for sharing across devices.
  • It would be great if there was something that would allow the social networking type of bookmarking I described, or make it possible in decentralised fashion

When I look at some of the available open source bookmarking tools that I can self-host I notice that mostly the ability to save full pages/documents and the offline functionality are missing elements. So maybe I should try and glue together something from different building blocks found elsewhere.

What do you use for bookmarking? How do you use bookmarks?

Sebastiaan at IWC Nürnberg last weekend did some cool stuff with visualising feeds he follows, as well as find a way of surfacing stuff from outside his feeds because those in his feeds talk about it or like it. That is very exciting to me as it creates a peripheral view, and really puts your network to use as a filter. He follows up with a good posting on readers.

Towards the end of that posting there’s some discussion of how to combat ways of feed overwhelm.
That Sebastiaan, reminds me of what I wrote about my feedreading strategies in 2005 (take a look at the images there, they help in understanding the text that follows).

I think it is useful to think not just of what you yourself consume in terms of feeds, and how to optimise that, but also in terms of the feedback loops you need/want back to the authors of some of your feeds.

Your network is a filter, and a certain level of feedback is needed to be able to spot patterns that lift signals above the noise, the peripheral vision you described. Both individually and collectively. But too much feedback creates echo-chambers. So the overall quality of your network / network’s feeds and interaction is part of the equation in thinking about feed overwhelm. It introduces needs for alternating and deliberate phases of divergence and convergence, and being able to judge diversity and quality of your network.

It’s in that regard very important to realise that there’s a key factor not present in your feeds that is enormously useful for filtering: your own personal knowledge about the author of a feed. If you can tag feeds with what you know of their authors (coder, Berlin, Drupal, e.g.), and how you perceive the social distance between you and them (from significant other to total stranger), you can do even more visualising by asking questions like “what are the topics that European front-end developers I know are excited about this week”, or by visualising what communities are talking about. Social distance also is a factor in dealing with overwhelm: I for instance read a handful of people important to me every day when they have posted, and others I don’t read if I don’t have time, and I therefore group my feeds by social distance.

Finally, overwhelm is more likely if you approach feeds as drinking from a tap. But again, you know things that are not present in your feeds: current interests you have, questions you have, things you’re working on. A listener more likely hears those things better that are close to them. This points to less a river-of-news approach, and more to an active interrogation of feeds based on your personal ‘agenda’ at a time of your choosing.

Fear of missing out is not important, especially not when the feedback loops, that I mentioned above, between authors exist. If it is a signal of some sort, and not noise, it will bounce around your network-as-a-filter for a while, and is likely to be there in some form still, when you next take a look. If it is important and you overlooked it, it will come up again when you look another time.

Also see my posting about my ideal feedreader, from a few months ago.

At IndieWeb Camp Nürnberg today I worked on changing the way my site displays webmentions. Like I wrote earlier, I would like for all webmentions to have a snippet of the linking article, so you get some context to decide if you want to go to that article or not.

It used to be that way in the past with pingbacks, but my webmentions get shown as “Peter mentioned this on ruk.ca”.

After hunting down where in my site this gets determined, I ended up in a file my Semantic Weblinks plugin, called class-linkbacks-handler.php. In this file I altered “get_comment_type_excerpts” function (which sets the template for a webmention), and the function “comment_text_excerpt”, where that template gets filled. I also altered the max length of webmentions that are shown in their entirety. My solution takes a snippet from the start of the webmention. I will later change it to taking a snippet from around the specific place where it links to my site. But at least I succeeded in changing this, and now know where to do that.

When the next update of this plugin takes place I will need to take care, as then my changes will get overwritten. But that too is less important for now.


The webmentions for this posting are now shown as a snippet from the source, below the sentence that was previously the only thing shown.

A great effect of spending a day in the same room with 20 or so more geeking inclined others, is you get a lot of examples, tools and services mentioned. And geek is as geek does, I try them out on the spot. Today this helped me become aware that something is wrong on my server with the OAuth authentication I run. I thought that it was working fine, as it is no problem to actually use it, for instance to log in with my own domain name at the IndieWeb wiki. But when interacting with my micropublishing endpoint not all goes well.

Today I noticed that:

  • When I try to post from Micropublish.net, I can log in at micropublish.net, but when I try to post I get an ‘unauthorized’ error
  • When I try to use the Omnibear Firefox add-on it authorises ok, but then endlessly tries to load the list of syndication targets
  • When I use Quill to post, it posts fine, but does not load the list of syndication targets

Those missing syndication targets (now that I understand what they are from todays sessions) was what first caught my eye. Testing the micropublish endpoint on my server myself I got the correct response, but Quill turned out to get ‘unauthorized’ as response for that request, just like micropublish.net got for posting.

The endpoint gives a correct response

In WordPress my IndieAuth plugin has a diagnostic tool, and running that, it turns out an authorisation header is not send out.

Which seems to be causing the problems. Reading in the links provided it seems like with XML-RPC, my hoster is actively blocking that header. [UPDATE: It is not, it’s just not available in the way the server currently runs PHP] Resulting in exactly the same experience as I had with XML-RPC, that it seems to be only half working (namely the ‘safe’ uses work, while the rest fails). There’s a work around, renaming the headers that get send out, and implementing that work-around is a thing for me to do tomorrow. To see if I can get around being unauthorised. [UPDATE: That workaround did not work until now]