Having created a working flow to generate OPML booklists directly from the individual book notes in my PKM system, I did the first actual run in production of those scripts today.

It took a few steps to get to using the scripts in production.

  • I have over 300 book note files in my Obsidian vault.
  • Of course most lacked the templated inline data fields that allow me to create lists. For the 67 fiction books I read in 2021 I already had a manual list with links to the individual files. Where needed I added the templated data fields.
  • Having added those inline fields where they were missing I can easily build lists in Obsidian with the Dataview plugin. Using this code


    results in

  • The same inline data fields are used by my scripts to read the individual files and build the same list in OPML
  • That gets automatically posted to my website where the file is both machine and human readable.

Doing this in production made me discover a small typo in the script that builds the OPML, now fixed (also in the GitHub repository). It also made me realise I want to add a way of ordering the OPML outline entries by month read.

Lists to take into production next are those for currently reading (done), non-fiction 2021, and the anti-library. That last one will be the most work, I have a very long list of books to potentially read. I will approach that not as a task of building the list, but as an ongoing effort of evaluating books I have and why they are potentially of interest to me. A way, in short, to extend my learning, with the list as a useful side effect. The one for currently reading is the least work, and from it the lists for fiction 2022 and non-fiction 2022 will automatically follow. The work is in the backlog, getting history to conform to the convention I came up with, not in moving forward from this point.

In parallel it is great to see that Tom Critchlow is also looking at creating such book lists, in JSON, and at digesting such lists from others. The latter would implement the ‘federated’ part of federated bookshelves. Right now I just point to other people’s list and rss feeds in my ‘list of lists‘. To me getting to federation doesn’t require a ‘standard’. Because JSON, OPML and e.g. schema.org have enough specificity and overlap between them to allow both publishers of lists and parsers or such lists enough freedom to use or discard data fields as they see fit. But there is definitely a discussion to be had on identifying that overlap and how to use it best. Chris Aldrich is planning an IndieWeb event on this and other personal libraries related topics next month. I look forward to participating in that, quite a number of interesting people have expressed interest, and I hope we’ll get to not just talk but also experiment with book lists.

Writing it down may help in getting out of the loop…

I’m continuing my tinkering with federated bookshelves, for which I made an OPML based way of publishing both lists of books, as well as point to other people’s lists and to RSS feeds of content about books. I now changed my XSL style sheet to parse my OPML files to be able to also parse mentions of RSS feeds.

Meanwhile I read Matt Webb’s posting on using RSS (and OPML) a few more times, and I keep thinking, “yes, but where do you leave the actual data?”
Then I read Stephen Downes’ recent posting on distributing reading material and entire books for courses through RSS, and realised it gave me the same sense of not sounding quite right, like Matt’s posting. That feeling probably means I’m not fully understanding their argument.

RSS is a by design simple XML format as a way to syndicate web content, including videos and podcasts. Content is an important word here, as is syndication: if you have something where new material gets added regularly, an RSS feed is a good way to push it out to those interested.
OPML is another by design simple XML format as a way to share outlines. Outlines are content themselves, and outlines can contain links to other content (including further outlines). One of the common uses of OPML is to share a list of RSS feeds through it, ‘these are the blogs I follow’.

In Matt’s and Stephen’s posts I think there are examples that fail to satisfy either the content part of RSS, or the syndication of new content part. In Matt’s case he talks about feeds of postings about books, like my book category in this site, which is fine, but also in terms of lists of books, which is where I struggle: a list doesn’t necessarily list pieces of content, let alone pieces of web content which RSS seems to require. It more likely is just a list. At the same time he mentions OPML as ‘library’, to use to point to such lists of books. Why would you use OPML for the list of lists, but not for the lists themselves, when those book lists themselves have no content per book, only a number of data attributes which aren’t the content items but only descriptions of items? And when the whole point of OPML outlines is branching lists? When a library isn’t any different from a list, other than maybe in size? Again it is different for actual postings about books, but you can already subscribe to those feeds as existing rivers of content, and point to those feeds (in the same OPML, as I do in my experimental set-up now as well).
In Stephen’s posting he talks about providing the content of educational resources through RSS. He suggests it for the distribution of complete books, and for course material. I do like the idea of providing the material for a course as a ‘blob’. We’re talking about static material here, a book is a finished artefact. Where then is the point in syndication through RSS (other than maybe if the book is a PDF or EPUB or something that might be an enclosure in a RSS feed)? Why not provide the material from its original web source, with its original (semantic) mark-up? Is it in any way likely that such content is going to be read in the same tool the RSS feed is loaded into? And what is the ‘change’ the RSS feed is supposed to convey here, when it’s a one-off distribution and no further change beyond that moment of distribution is expected?

OPML outlines can have additions and deletions, though at a slower pace than e.g. blogs. You could have an RSS feed for additions to an OPML outline (although OPML isn’t web content). But you could also monitor OPML outlines themselves for changes (both additions and deletions) over time. Or reload and use the current version as is, without caring about the specific changes in them.

The plus side of OPML and RSS is that there are many different pieces of code around that can deal with these formats. But most won’t be able to deal as-is with adding data attributes that we need to describe books as data, but aren’t part of the few basic mandatory attributes RSS and OPML are expected to contain. Both RSS and OPML do allow for the extension of attributes, if you follow existing name spaces, such as e.g. schema.org’s for creative works, which seems applicable here (both for collections of books, i.e. a shelf or a library, as well as books themselves). If the use of RSS (and OPML for lists of RSS files) is suggested because there’s an existing eco-system, but we need to change it in a way that ensures the existing ecosystem won’t be able to use it, then where’s the benefit of doing so? To be able to build readers and to build OPML/RSS creators, it is useful to be able to re-use existing bits and pieces of code. But is that really different from creating ones own XML spec? At what point are our adaptations to overcome the purposeful simplicity of OPML and RSS destroying the ease of use we hope to gain from using that simplicity?

Another thing that I keep thinking about is that book lists (shelves, libraries) and book data, basically anything other than web published reviews of books, don’t necessarily get created or live on the web. I can see how I could easily use my website to create OPML and RSS feeds for a variety book lists. But it would require me to have those books and lists as content in my website first, which isn’t a given. Keeping reading lists, and writing reading notes, are part of my personal knowledge management workflow, and it all lives in markdown textfiles on my local harddrive. I have a database of e-books I own, which is in Calibre. I have an old database of book descriptions/data of physical books I owned and did away with in 2012, which is in Delicious Library. None of that lives on the web, or online in any form. If I am going to consistently share bookshelves/lists, then I probably need to create them from where I use that information already. I think Calibre has the ability to work with OPML, and has an API I could use to create lists.
Putting that stuff first into my website in order to generate one some or all of XML/OPML/RSS/JSON from it there, is work and friction I don’t want. If it is possible to automatically put it in my website from my own local notes and databases, that is fine, but then it is just as possible to automatically create all the XML/OPML/RSS/JSON stuff directly from those local notes and databases as well. Even if I would use my website to generate sharable bookshelves, I wouldn’t work with other people’s lists there.

I also think that it is very unlikely that a ‘standard’ emerges. There will always be differences in how people share data about books, because of the different things they care about when it comes to reading and books. Having namespaces like schema.org is useful of course, but I don’t expect everyone will use them. And even if a standard emerges, I am certain there will be many different interpretations thereof in practice. It is key to me that discoverability, of both people sharing book lists and of new to me books, exists regardless. That is why I think, in order to read/consume other people’s lists, other than through the human readable versions in a browser/reader, and to tie them into my information filtering and internal tools/processes, I likely need to have a way to flexibly map someone else’s shared list to what I internally use.

I’m not sure where that leaves me. I think somewhere along these lines:

  • Discovery, of books and people reading them, is my core aim for federation
  • OPML seems useful for lists (of lists)
  • RSS seems useful for content about books
  • Both depend on using specific book related data attributes which will have limited standardisation, even if they follow existing namespaces. It is impossible to depend on or assume standardisation, something more flexible is needed
  • My current OPML lists points to other lists by me and others, and to RSS feeds by me and others
  • I’m willing to generate OPML, RSS and JSON versions of the same lists and content if useful for others, other than templating there’s no key difference after all
  • Probably my website is not the core element in creating or maintaining lists. It is for publishing things about books.
  • I’m interested in other people’s RSS feeds about books, and will share my list of feeds I follow as OPML
  • I need to figure out ways to create OPM/RSS/JSON etc directly from where that information now lives in my workflow and toolset
  • I need to figure out ways to incorporate what others share with me into my workflow and toolset. Whatever is shared through RSS already fits existing information strategies.
  • For a limited number of sources shared with me by others, it might make sense to create mappings of their content to my own content structures, so I can import/integrate them more fully.

Related postings:
Federated Bookshelves (April 2020)
Federated Bookshelves Revisited (April 2021)
Federated Bookshelves Proof of Concept (May 2021)
Booklist OPML Data Structure (May 2021)