Writing it down may help in getting out of the loop…

I’m continuing my tinkering with federated bookshelves, for which I made an OPML based way of publishing both lists of books, as well as point to other people’s lists and to RSS feeds of content about books. I now changed my XSL style sheet to parse my OPML files to be able to also parse mentions of RSS feeds.

Meanwhile I read Matt Webb’s posting on using RSS (and OPML) a few more times, and I keep thinking, “yes, but where do you leave the actual data?”
Then I read Stephen Downes’ recent posting on distributing reading material and entire books for courses through RSS, and realised it gave me the same sense of not sounding quite right, like Matt’s posting. That feeling probably means I’m not fully understanding their argument.

RSS is a by design simple XML format as a way to syndicate web content, including videos and podcasts. Content is an important word here, as is syndication: if you have something where new material gets added regularly, an RSS feed is a good way to push it out to those interested.
OPML is another by design simple XML format as a way to share outlines. Outlines are content themselves, and outlines can contain links to other content (including further outlines). One of the common uses of OPML is to share a list of RSS feeds through it, ‘these are the blogs I follow’.

In Matt’s and Stephen’s posts I think there are examples that fail to satisfy either the content part of RSS, or the syndication of new content part. In Matt’s case he talks about feeds of postings about books, like my book category in this site, which is fine, but also in terms of lists of books, which is where I struggle: a list doesn’t necessarily list pieces of content, let alone pieces of web content which RSS seems to require. It more likely is just a list. At the same time he mentions OPML as ‘library’, to use to point to such lists of books. Why would you use OPML for the list of lists, but not for the lists themselves, when those book lists themselves have no content per book, only a number of data attributes which aren’t the content items but only descriptions of items? And when the whole point of OPML outlines is branching lists? When a library isn’t any different from a list, other than maybe in size? Again it is different for actual postings about books, but you can already subscribe to those feeds as existing rivers of content, and point to those feeds (in the same OPML, as I do in my experimental set-up now as well).
In Stephen’s posting he talks about providing the content of educational resources through RSS. He suggests it for the distribution of complete books, and for course material. I do like the idea of providing the material for a course as a ‘blob’. We’re talking about static material here, a book is a finished artefact. Where then is the point in syndication through RSS (other than maybe if the book is a PDF or EPUB or something that might be an enclosure in a RSS feed)? Why not provide the material from its original web source, with its original (semantic) mark-up? Is it in any way likely that such content is going to be read in the same tool the RSS feed is loaded into? And what is the ‘change’ the RSS feed is supposed to convey here, when it’s a one-off distribution and no further change beyond that moment of distribution is expected?

OPML outlines can have additions and deletions, though at a slower pace than e.g. blogs. You could have an RSS feed for additions to an OPML outline (although OPML isn’t web content). But you could also monitor OPML outlines themselves for changes (both additions and deletions) over time. Or reload and use the current version as is, without caring about the specific changes in them.

The plus side of OPML and RSS is that there are many different pieces of code around that can deal with these formats. But most won’t be able to deal as-is with adding data attributes that we need to describe books as data, but aren’t part of the few basic mandatory attributes RSS and OPML are expected to contain. Both RSS and OPML do allow for the extension of attributes, if you follow existing name spaces, such as e.g. schema.org’s for creative works, which seems applicable here (both for collections of books, i.e. a shelf or a library, as well as books themselves). If the use of RSS (and OPML for lists of RSS files) is suggested because there’s an existing eco-system, but we need to change it in a way that ensures the existing ecosystem won’t be able to use it, then where’s the benefit of doing so? To be able to build readers and to build OPML/RSS creators, it is useful to be able to re-use existing bits and pieces of code. But is that really different from creating ones own XML spec? At what point are our adaptations to overcome the purposeful simplicity of OPML and RSS destroying the ease of use we hope to gain from using that simplicity?

Another thing that I keep thinking about is that book lists (shelves, libraries) and book data, basically anything other than web published reviews of books, don’t necessarily get created or live on the web. I can see how I could easily use my website to create OPML and RSS feeds for a variety book lists. But it would require me to have those books and lists as content in my website first, which isn’t a given. Keeping reading lists, and writing reading notes, are part of my personal knowledge management workflow, and it all lives in markdown textfiles on my local harddrive. I have a database of e-books I own, which is in Calibre. I have an old database of book descriptions/data of physical books I owned and did away with in 2012, which is in Delicious Library. None of that lives on the web, or online in any form. If I am going to consistently share bookshelves/lists, then I probably need to create them from where I use that information already. I think Calibre has the ability to work with OPML, and has an API I could use to create lists.
Putting that stuff first into my website in order to generate one some or all of XML/OPML/RSS/JSON from it there, is work and friction I don’t want. If it is possible to automatically put it in my website from my own local notes and databases, that is fine, but then it is just as possible to automatically create all the XML/OPML/RSS/JSON stuff directly from those local notes and databases as well. Even if I would use my website to generate sharable bookshelves, I wouldn’t work with other people’s lists there.

I also think that it is very unlikely that a ‘standard’ emerges. There will always be differences in how people share data about books, because of the different things they care about when it comes to reading and books. Having namespaces like schema.org is useful of course, but I don’t expect everyone will use them. And even if a standard emerges, I am certain there will be many different interpretations thereof in practice. It is key to me that discoverability, of both people sharing book lists and of new to me books, exists regardless. That is why I think, in order to read/consume other people’s lists, other than through the human readable versions in a browser/reader, and to tie them into my information filtering and internal tools/processes, I likely need to have a way to flexibly map someone else’s shared list to what I internally use.

I’m not sure where that leaves me. I think somewhere along these lines:

  • Discovery, of books and people reading them, is my core aim for federation
  • OPML seems useful for lists (of lists)
  • RSS seems useful for content about books
  • Both depend on using specific book related data attributes which will have limited standardisation, even if they follow existing namespaces. It is impossible to depend on or assume standardisation, something more flexible is needed
  • My current OPML lists points to other lists by me and others, and to RSS feeds by me and others
  • I’m willing to generate OPML, RSS and JSON versions of the same lists and content if useful for others, other than templating there’s no key difference after all
  • Probably my website is not the core element in creating or maintaining lists. It is for publishing things about books.
  • I’m interested in other people’s RSS feeds about books, and will share my list of feeds I follow as OPML
  • I need to figure out ways to create OPM/RSS/JSON etc directly from where that information now lives in my workflow and toolset
  • I need to figure out ways to incorporate what others share with me into my workflow and toolset. Whatever is shared through RSS already fits existing information strategies.
  • For a limited number of sources shared with me by others, it might make sense to create mappings of their content to my own content structures, so I can import/integrate them more fully.

Related postings:
Federated Bookshelves (April 2020)
Federated Bookshelves Revisited (April 2021)
Federated Bookshelves Proof of Concept (May 2021)
Booklist OPML Data Structure (May 2021)

Bookmarked This is Fine: Optimism & Emergency in the P2P Network (newdesigncongress.org)
...driven by the desire for platform commons and community self-determination. These are goals that are fundamentally at odds with – and a response to – the incumbent platforms of social media, music and movie distribution and data storage. As we enter the 2020s, centralised power and decentralised communities are on the verge of outright conflict for the control of the digital public space. The resilience of centralised networks and the political organisation of their owners remains significantly underestimated by protocol activists. At the same time, the decentralised networks and the communities they serve have never been more vulnerable. The peer-to-peer community is dangerously unprepared for a crisis-fuelled future that has very suddenly arrived at their door.

Another good find by Neil Mather for me to read a few times more. A first reaction I have is that in my mind p2p networks weren’t primarily about evading surveillance, evading copyright, or maintaining anonymity, but one of netwerk-resilience and not having someone with power over the ‘off-switch’ for the entire network. These days surveillance and anonymity are more important, and should gain more attention in the design stage.

I find it slightly odd that the dark web and e.g. TOR aren’t mentioned in any meaningful way in the article.

Another element I find odd is how the author talks about extremists using federated tools “Can or should a federated network accept ideologies that are antithetical to its organic politics? Regardless of the answer, it is alarming that the community and its protocol leadership could both be motivated by a distrust of centralised social media, and be blindsided by a situation that was inevitable given the common ground found between ideologies that had been forced from popular platforms one way or another.”
It ignores that with going the federated route extremists loose two things they enjoyed on centralised platforms: amplification and being linked to the mainstream. In a federated setting I with my personal instance, and any other instance decides themselves whom to federate with or not. There’s nothing for ‘a federated network to accept’, each instance does their own acceptance. There’s no algorithmic rage-engine to amplify the extreme. There’s no standpoint for ‘the federated network’ to take, just nodes doing their own thing. Power at the edges.

Also I think that some of the vulnerabilities and attack surfaces listed (Napster, Pirate Bay) build on the single aspect in that context that still had a centralised nature. That still held some power in a center.

Otherwise good read, with good points made that I want to revisit and think through more.

[TL;DR: A long tail is needed for distributed technology to be sustainable I think, otherwise it’s just centralisation and single points of failure in a different form. A long tail means the bottom 80% take over 50% of a market, and the top 20% under 50%. Mastodon currently has over 85% of its participants in the top 20% of instances, and it’s worse than that as 77% of participants are in 0,7% of instances. Just 15% are in the bottom 80% of instances. There’s a power law distribution, but it’s not a long tail. What can Mastodon do to get there and to sustainability?]

On 6 October 2016 Mastodon was launched, and its originator Eugen Rochko looks back in a blogpost on the journey of the past two years.

I joined on 7 April 2017, 6 months after its launch, at the Mastodon.cloud instance. I posted some messages for a month, then fell quiet for half a year. A few messages last March, and then I started using it more frequently last month, in the run-up to figuring out how to run Mastodon for myself (which for now means a hosted solution, but still aiming for running it from the home router). It’s now part of my daily information diet, but no guarantee yet it will last, although being certain I have ‘my half’ of the conversation on a domain I own helps a lot towards maintaining worthwhile exchanges.

Eugen’s blogpost is rightfully proud of what has been accomplished. It’s not yet proof of the sustainability of federated solutions though as he suggests.

He shares a few interesting numbers about the usage of Mastodon. The median of the 3460 known instances is 8 users. In total there are 1.627.557 registered accounts. The largest instance has 415.941 members, while the top 3 together have 52% of users, meaning the number 2 and 3 average 215.194 accounts. The top 25 largest instances have 77% or 1.253.219 members, meaning that the numbers 4-25 average 18.495 users. As the median is 8 it means the smallest 1730 instances have at most 8*1730 = 13.840 users. It also means that the number 26 to number 1730 instances have at least 360.498 members, or an average of 211. This tells us there’s a Pareto power law distribution: the top 20% of instances hold at least 85% of users at the moment. That also means there is no long tail, just a stub that holds at most 15% of Mastodon users only. For a long tail to exist, the smallest 80% of instances should account for over 50% of users, or over three times more than the current number.

As the purpose of Mastodon is distribution, where federation allows everyone to connect regardless of their instances (sort of like e-mail), I think Mastodon can only be deemed sustainable if there is a true long tail. Meaning, that while the number of users goes up, the number of instances should go up at a faster rate. So that over 50% of all Mastodon users will be on the 80% smallest or even individual instances. In the current numbers we should be most interested in the 50% of instances that now have 8 or less users, and find out what drives those instances, so we may have many many more of them. We should also think about what a bigger-to-smaller-instances funnel for members can look like, not just leave it to chance. I think that the top 25 Mastodon instances, which is just 0.7% of the total, currently having 77% of all users is very problematic from a sustainability perspective. Because that level of concentration is completely at odds with the stated purpose of Mastodon: distribution.

Eugen Rochko in his anniversary posting points at a critical article from April 2017 in Mashable, implying that criticaster has been been proven wrong definitively. I disagree. While much of the ‘predictions’ in that article are indeed silly, it also contains a few hints as to where sustainability may be found. The criticaster doesn’t get federation (yet likely uses mail everyday), and complains about discovery (yet likely is relieved not all his personal e-mail addresses are to be found in Google). Yet if we can’t explain distribution and federation, and can’t or don’t communicatie how discovery works in such a setting then we won’t be able to make a long tail grow. For more people to adopt small or individual instance we need to bring the threshold for running your own instance way down, and then way down again. To the level of at most one click installing a script on any regular hosting service, and creating a first account.

Using open protocols, like ActivityPub which Mastodon supports, is key in getting more people out of walled gardens and silos, and on the open web. Tracking its adoption is a useful measure of success, but 2 years of existence is not a sign of sustainability at all. What Eugen Rochko has kicked off with Mastodon is valuable and very laudable, but we have barely started getting to where we need to be for it to stick.

The Twitter-like platform Gab has been forced offline, as their payment providers, hosting provider and domain provider all told them their business was no longer welcome. The platform is home to people with extremist views claiming their freedom of speech is under threat. At issue is of course where that speech becomes calling for violence, such as by the Gab-user who horribly murdered 11 people last weekend in Pittsburgh driven by anti-semitic hate.

Will we see an uptick in the use of federated sites such as Mastodon when platforms like Gab that are much more public disappear?

This I think isn’t about extremists being ‘driven underground’ but denying calls for violence, such as happened on Gab, a place in public discourse. An uptick in the use of federated sites would be a good development, as federation allows for much smaller groups to get together around something, whatever it is. In reverse that means no-one else needs to be confronted with it either if they don’t want to. Within the federation of Mastodon sites, I regularly come across instances listing other instances they do not connect to, and for which reasons. It puts the power of supporting welcomed behaviour and pushing back on unwelcome behaviour in the hands of more people, meaning every person running a Mastodon instance (and you can have your own instance), than just Twitter or Facebook management.


example of an instance denying another to be federated with it

That sort of moderation can still be hard, even if the moderator to member ratio is already much better than on the main platforms. But that just points the way to the long tail of much smaller instances, more individual ones even. It means it becomes easier for individuals and small groups to shun small cells, echo-chambers and rage bubbles, and not accidentally ending up in them or being forcefully drawn into them while you were having other conversations, like what can happen on Twitter. See my earlier posting on the disintegration of discourse. You then can do what networks do well: route around the stuff you perceive as damage or non-functional. It creates a stronger power symmetry and communication symmetry. It also denies extremists a wider platform. Yes they can still call for violence, which remains just as despicable. Yes, they can still blame Others for anything and be hateful of them. But they will be doing it in their back yard (or Mastodon instance), not in the park where you like to go walk your dog or do your morning run (or Twitter). They will not have a podium bigger than warranted, they will not have visibility beyond their own in-crowd. And will have to deal with more pushback and reality whenever they step outside such a bubble, without the pleasant illusion ‘everyone on twitter agrees with me’.

As I didn’t succeed yet in getting Mastodon to run on a Raspberry Pi, nor in running a Gnu Social instance that actually federates on my hosting package, I’ve opted for an intermediate solution to running my own Mastodon instance.

Key in all this is satisfying three dimensions: control, flexibility and ease of use. My earlier attempts satisfy the control and flexibility dimensions, but as I have a hard time getting them to work, do not satisfy the ease of use dimension yet.

At the same time I did not want to keep using Mastodon on a generic server much longer, as it builds up a history there which with every conversation ups the cost of leaving.

The logical end point of the distributed web and federated services is running your own individual instance. Much as in the way I run my own blog, I want my own Mastodon instance.

Such an individual instance needs to be within my own scope of control. This means having it at a domain I own. and being able to move everything to a different server at will.

There is a hoster, Masto.host run by Hugo Gameiro, who provides Mastodon hosting as a monthly subscription. As it allows me to use my own domain name, and provides me with admin privileges of the mastodon instance, this is a workable solution. When I succeed in getting my own instance of Mastodon running on the Rapsberry Pi, I can simply move the entire instance at Masto.host to it.

Working with Hugo at Masto.host was straightforward. After registering for the service, Hugo got in touch with me to ensure the DNS settings on my own domain were correct, and briefly afterwards everything was up and running.
Frank Meeuwsen, who started using Masto.host last month, kindly wrote up a ‘moving your mastodon account’ guide in his blog (in Dutch). I followed (most) of that, to ensure a smooth transition.

Using Mastodon? Do follow me at https://m.tzyl.nl/@ton.

Screenshots of my old Mastodon.cloud account, and my new one on my own domain. And the goodbye and hello world messages from both.

In the past few days I tried a second experiment to run my own Mastodon instance. Both to actually get a result, but also to learn how easy or hard it is to do. The first round I tried running something on a hosted domain. This second round I tried to get something running on a Raspberry Pi.

The Rapsberry Pi is a 35 Euro computer, making it very useful for stand-alone solutions or as a cheap hardware environment to learn things like programming.

20180923_144442Installing Debian Linux on the Rapsberry Pi

I found this guide by Wim Vanderbauwhede, which describes installing both Mastodon and Pleroma on a Raspberry Pi 3. I ordered a Raspberry Pi 3 and received it earlier this week. Wim’s guide points to another guide by on how to install Ruby on Rails and PostgresSQL on a Rapsberry Pi. The link however was dead, and that website offline. However archive.org had stored several snapshots, which I save to Evernote.

Installing Ruby on Rails went fine using the guide, as did installing PostgresSQL. Then I returned to Wim’s guide, now pointing to the Mastodon installation guide. This is where the process currently fails for me: I can’t extend the Ubuntu repositories mentioned, nor node.js.

So for now I’m stalled. I’ll try to get back to it later next week.