After I built a proof of concept of using OPML to share and federate book lists yesterday (UPDATE: description of the data structure for booklists), Tom Chritchlow asked me about subscribing to OPML lists in the comments. I also reread Matt Webb’s earlier posting about using OPML and RSS for book lists.
That results in a few remarks and questions I’d like to make and ask:
- OPML serves 2 purposes
- In the words of Dave Winer, opml’s creator, OPML is meant as a “transparently simple, self-documenting, extensible and human readable format that’s capable of representing a wide variety of data that’s easily browsed and edited” to create and manipulate outlines, i.e. content structured hiearchically / tree-like.
- the format is a way to exchange such outlines between outliner tools.
- In other words OPML is great for making (nested) lists, and for exchanging them. I use outlines to build my talks and presentations. It could be shopping lists like in Doug Engelbart’s 1968 ‘mother of all demos’. And indeed it can be lists of books.
- A list I regard as an artefact in itself. A list of something is not just iterating the somethings mentioned, the list itself has a purpose and meaning for its creator. It’s a result of some creative act, e.g. curation, planning, writing, or desk research.
- A book list I regard as a library, of any size. The list can be as short as the stack on my night reading table is high, as long as a book shelf in my home is wide, or as enormous as the full catalogue of the Royal Library. Judging by Tom Critchlow’s name for his booklist data ‘library.json‘ he sees that similarly.
- A book list, as I wrote in my posting about the proof of concept, can have books in them, and other book lists by myself or others. That is where the potential for federation lies. I can from a book point to Tom’s list as the source of inspiration. I could include one of Tom’s booklists into my own booklists.
- A list of books is different from a group of individual postings about books as also e.g. presented on my blog’s reading category page. I blog about books I read, but not always. In fact I haven’t written any postings at all this year, but have read 25 books or so since January 1st. It is easier to keep a list of books, than to write postings about each of the books listed. This distinction is expressed too in Tom Macwright’s set-up. There’s a list of books he’s read, which points to pages with a posting about an entry in that list, but the list is useful without those postings.
- The difference between booklists as artefacts and groups of postings about books that may also be listed has impact on what it means to ‘subscribe’ to them.
- A book list, though it can change over time, is a steady artefact. Books may get added or removed just like in a library, but those changes are an expression of the will of its maker, not a direct function of time.
- My list of blogsposts about books, in contrast is fully determined by time: new entries get added on top, older ones drop off the list because the list has a fixed length.
- OPML is very suited for my lists as artefacts
- RSS is very suited for lists as expression of time, providing the x most recent posts
- Subscribing to RSS feeds is widely available
- Subscription is not something that has a definition for OPML (that you can use OPML to list RSS subscriptions may be confusing though)
- Inclusion however is a concept in OPML: I can add a list as a new branch in another list. If you do that once you only clone a list, and go your own seperate way again. You could also do it dynamically, where you always re-import the other list into your own. Doing it dynamically is a de-facto subscription. For both however, changes in the imported list are non-obvious.
- If you keep a previously seen copy and compare it to the current one, you could monitor for changes over time in an OPML list (Inoreader did that in 2014 so you could see and subscribe to new RSS feeds in other people’s OPML feed lists, also see Marjolein Hoekstra’s posting on the functionality she created.).
- I am interested in both book lists, i.e. libraries / bookshelves, the way I am interested in browsing a book case when I visit somebody’s home, and in reading people’s reviews of books in the form of postings. With OPML there is also a middle ground: a book list can for each book include a brief comment, without being a full review or opinion. In the shape of ‘I bought this because….’ this is useful input for social filtering for me.
- While interested in both those types, libraries, and reviews, I think we need to treat them as completely different things, and separate them out. It is fine to have an OPML list of RSS feeds of reviews, but it’s not the same as having an OPML book list, I think.
- I started at the top with quoting Dave Winer about OPML being a “simple, self-documenting, extensible and human readable format that’s capable of representing a wide variety of data that’s easily browsed and edited“. That is true, but needs some qualification:
- While I can indeed add all kinds of data attributes, e.g. using namespaces and standardised vocabularies like schema.org, there’s no guarantee nor expectation that any OPML parser/reader/viewer would do anything with them.
- This is the primary reason I used an XSL template for my OPML book lists, as it allows me to provide a working parser right along with the data itself. Next to looking at the raw file content itself, you can easily view in a browser what data is contained in it.
- In fact I haven’t seen any regular outliner tool that does anything with imported OPML files beyond looking at the must have ‘text’ attribute for any outline node. Tinderbox, when importing OPML, does look also at URL attributes and a few specific others.
- I know of no opml viewer that shows you which attributes are available in an OPML list, let alone one that asks you whether to do something with them or not. Yet exploring the data in an OPML file is a key part of discovery of other people’s lists, of the aim to federate booklists, and for adopting better or more widely shared conventions over time.
- Are there generic OPML attribute explorers, which let you then configure what to pay attention to? Could you create something like an airtable on the fly from an OPML list?
- Monitoring changes in OPML list you’re interested in is possible as such, but if OPML book lists you follow have different structures it quickly becomes a lot of work. That’s different from the mentioned Inoreader example because OPML lists of RSS feeds have a predefined expected structure and set of attributes right in the OPML specification.
- Should it be the default to provide XSL templates with OPML files, so that parsing a list as intended by the creator of the list is built right into the OPML list itself?
- Should we ‘dumb down’ lists by moving data attributes of an outline node to a sub-node each? You will reduce machine readability in favor of having basic OPML outliners show all information, because there are no machines reading everything yet anayway.
I think for the coming weeks I’ll be on the lookout for sites that have book lists and book posting feeds, to see what commonalities and differences I find.
On OPML and RSS for Federated Bookshelves door Ton Zijlstra (zylstra.org)
Ik geef toe dat het niet handig is om dit te lezen vlak voor ik ga slapen. Want het eerste waar ik aan denk als ik Ton’s experimenten lees over boeken en OPML: “Zou dit ook met notities kunnen? Kun je backlinks tussen notities als OPML aan elkaar knopen? Kun je zo een netwerk van notities maken, niet alleen voor jezelf maar juist tussen verzamelingen?”
Ik weet het nog niet. Het voordeel is dat het nu op mijn blog is gepubliceerd als klein zaadje om later over na te denken. En ik kan gaan slapen.
<!–
–>
OPML and RSS for Federated Bookshelves. zylstra.org/blog/2021/05/o…
I have a bunch of tools that do stuff with OPML attributes. For example, I use littleoutliner.com to write my blog posts, and Old School, which renders the outline as a blog does a lot with attributes. That’s just one example.
Thanks to Dave Winer for linking to Ton’s post. I am interested in this idea!
Great ideas, Ton!
Ah nice. I’ll add an OPML feed to boris.libra.re/library/ asap. Perhaps a “Recent” and some form of “favorites” contexts.
Cool! icymi: here’s the data structure I came up with for the OPML file. zylstra.org/blog/booklist-…
OPML is much in the air these days: Ton is experimenting with federated bookshelves, and Paul is using OPML of yesteryear to explore his feed-reading past.
Which got me thinking about blog post archaeology, and using the blogs that I read every day as a corpus to explore in different ways.
My first thought was: export my list of feeds as OPML, then write code to parse the OPML to get the RSS feed for each blog, then write more code to retrieve the archive of each blog, and then write more code to parse the body of each post. In theory that would all be possible, as many languages have plug-and-play libraries to make parsing OPML and RSS relatively easy.
But then I realized that my RSS reader, FreshRSS, maintains a long archive of blog posts in its local database. And I thought, as a first experiment, it might be interesting to extract all the quotes from that archive–anything wrapped in “blockquote” in the body of the post–by way of providing an alternate interface for experiencing the posts all over again.
Here’s what I did to make this happen:
I used the command line interface for FreshRSS to export a JSON representation of the archive, one file per blog:
cd freshness
./export-zip-for-user.php --user peter > peter.zip
I copied the resulting peter.zip file to my local machine, unzipped it into a folder called peter, and then used the following PHP, which depends on PHP Simple HTML DOM Parser, to generate an HTML file of the quotes:
<?php
require_once("simplehtmldom/simple_html_dom.php");
$path = "./peter";
if ($handle = opendir($path)) {
while (false !== ($file = readdir($handle))) {
if ('.' === $file) continue;
if ('..' === $file) continue;
parseJSON($path . '/' . $file);
}
closedir($handle);
}
function parseJSON($file) {
$json = file_get_contents($file);
$feed = json_decode($json);
if ($feed) {
print "<h1>" . str_replace(' articles', '', str_replace('List of ', '', $feed->title)) . "</h1>n";
foreach ($feed->items as $item) {
$html = str_get_html($item->content->content);
if ($html) {
if ($html->find('blockquote')) {
echo "<h2><a href="" . $item->id . "">" . $item->title . "</a></h2>n";
foreach($html->find('blockquote') as $element) {
echo "<blockquote style='border: 1px solid grey; padding: 20px'>" . $element->innertext . "</blockquote>n";
}
}
}
}
}
}
I ran the script, dumping the result into an HTML file:
php parse.php > quotes.html
It turns out that the blogs I follow include a lot of quotes, and the file is–quotes.html–is, to some degree, impenetrably useless.
Which got me thinking: what if I rejigged this output as an OPML file, which, among other things, I could load into OmniOutliner to browse.
So, I rejigged the code:
<?php
require_once("simplehtmldom/simple_html_dom.php");
$path = "./peter";
print '<?xml version="1.0" encoding="UTF-8"?>' . "n";
print '<opml version="2.0"><head><title>Quotes in Posts</title></head>';
print '<body>' . "n";
if ($handle = opendir($path)) {
while (false !== ($file = readdir($handle))) {
if ('.' === $file) continue;
if ('..' === $file) continue;
parseJSON($path . '/' . $file);
}
closedir($handle);
}
print '</body>';
print '</opml>';
function parseJSON($file) {
$json = file_get_contents($file);
$feed = json_decode($json);
if ($feed) {
print "<outline text="" . str_replace(' articles', '', str_replace('List of ', '', htmlspecialchars($feed->title))) . "">n";
foreach ($feed->items as $item) {
$html = str_get_html($item->content->content);
if ($html) {
if ($html->find('blockquote')) {
echo "<outline text="" . htmlspecialchars($item->title) . "">n";
foreach($html->find('blockquote') as $element) {
echo "<outline text="" . htmlspecialchars(strip_tags($element->innertext)) . ""></outline>n";
}
print "</outline>n";
}
}
}
print "</outline>n";
}
}
And, sure enough, the result is somewhat less impenetrable. And kind of cool:
The result also shows one of the limitations of HTML as currently practiced, which generally leaves quotes without machine-readable attribution, something that using more semantic HTML, as illustrated here, would help alleviate:
<figure>
<blockquote cite="https://www.huxley.net/bnw/four.html">
<p>Words can be like X-rays, if you use them properly—they’ll go through anything. You read and you’re pierced.</p>
</blockquote>
<figcaption>—Aldous Huxley, <cite>Brave New World</cite></figcaption>
</figure>
I’ll try to start doing that with my own quotes.
OPML
RSS
Quotes
HTML
Blogs
The proof of concept book list I made in opml (also see these additional remarks) currently has the following structure:
It follows the OPML 2 specification
It uses schema.org specifications w.r.t. ‘thing’, ‘creative work’, ‘collection’ and ‘book’ for outline elements and data attributes within them, with a few exceptions.
The file
A booklist file is in OPML format, and has a .opml file extension.
It opens with declaring it to be XML version 1.0 and utf-8 encoding.
It declares an XSL stylesheet, for which the URL is specified, which allows HTML rendering of the file. I think it’s important to package a opml to html parser with the booklist file, so that regardless of data structure, anyone can see what data is contained within it.
It declares OPML version 2.0
The HEADER section
In the HEADER section of the OPML file the following fields are used:
title: mandatory, the name of this booklist file, or of the owner’s main list of lists if this is a sublist
url: mandatory, the url of the booklist file meant in the title
dateCreated: date created, optional
dateModified: date modified, optional
ownerName: mandatory, name of the list owner
ownerId: the url of the owner, optional
ownerEmail: email address of the owner, optional
the OPML HEADER fields for expansion state, vertical scroll state, and for window location are not used (and ignored by the included XSL parser if present).
The BODY section
The body section contains one or more
outline
elements, with a number of attributes. Each attribute can exist only once within an outline element.type=”collection” : At least one is needed. A collection is a single booklist. With the following data attributes, which are all strings:
text: mandatory, the name of the booklist
author: the name of the creator of the booklist, expected
url: the URL of the collection, if it has its own URL, optional if the current file outlines books within the collection
comment: a brief description of the list, optional
type=”book”: A book is always part of a collection. If a collection has its own URL attribute (different from the url of the current file), it does not need to have any book within the file where the collection is listed. If a collection does not have its own URL attribute (or is the current file’s url), it is expected have at least one book (otherwise it’s simply an empty collection). With the following data attributes:
text: mandatory, a string “[title of book] by [name of author(s)/editor]
name: mandatory, the title of the book
author: mandatory, the name of the author(s) or editor of the book
isbn: the ISBN number of the book, optional
comment: a short comment by the booklist owner about the inclusion of the book in the list, optional
url: an url for the book itself, optional
authorurl: the url to the website of the book’s author. This attribute is not listed as part of schema.org. Optional
referencelisturl: the url of a list by a different owner, where this list’s owner found the book. This attribute is not listed as part of schema.org. Optional.
referenceurl: the url of a posting or a person’s url that served as recommendation or motivation for the inclusion of the book in this list by its owner. This attribute is not listed as part of schema.org. Optional.
inLanguage: the language in which the book is written as ISO-639(-1/2/3) code, optional
category: a list of tags, comma separated, optional
type=”rss”: a booklist opml file can point to one or more RSS feeds, optional. Multiple rss-type nodes can be grouped together nested in a typeless outline node with only a text attribute for the name of the group. Not a node within a ‘collection’, not a sub node of a ‘book’. E.g. the book reviews site and feed of someone. These feeds are not booklists or collections but content streams, to which the booklist file owner may want to point. With the following data attributes:
text: mandatory, the name of the feed
xmlUrl: mandatory, the url of the RSS feed
htmlUrl: the url of the website the RSS feed originates from, optional
author: the author of the RSS feed, optional. I use it mostly to mark my own feeds in the XSL style sheet, so I can display it differently than feed I myself subscribe to
type=”include”: points to an OPML file, preferrably a booklist file, that then should be included at this point in this booklist file. In booklists files only to be used at the top level, not as sub node in a ‘collection’ or ‘book’. Optional, and at this point only foreseen, not implemented. With the following attributes:
text: mandatory, descrption or title of the file to be included. This is what is shown in outliners and html renderings.
url: mandatory, the link to the opml file to be included, the linked file must be an .opml file.
Goedemorgen! Terwijl ik doorploeter met nieuwe bestrating én borders voor een groenere en insectvriendelijke achtertuin, gaat het werk in de Digital Garden op een lager pitje door. Vandaag deel ik wat losse gedachten met je over de toekomst van notitie-apps en nieuwsbrieven. En hoe ze aan elkaar zijn te knopen met open protocollen. Laat me weten wat jouw gedachten zijn!
Blog on!
De meest geklikte link in de vorige editie van deze nieuwsbrief was de checklist van 15 punten voor je eigen nieuwsbrief. Mooi!
De goudader van notities en nieuwsbrieven.
Dave Winer is de uitvinder van zo’n beetje alles wat met zelf-publicatie op het web heeft te maken. Hij blijft nieuwe software maken door verder te bouwen op wat al is en wat werkt. Een van zijn projecten is Little Outliner. Dave werkt al jaren met outliner software. Het is een eenvoudige manier van lijsten maken, zoals je ook kunt doen in alle tekstverwerking-apps, maar net zo goed in apps als Workflowy, Roam, Obsidian en honderden anderen. Outliners zijn een hele goede manier om informatie snel te schrijven, organiseren en verdelen.
Zo kwam ik vanochtend via een retweet van Dave op de site van Brandon Toner. Hij test de Little Outliner en is er dol op. Kan ik me voorstellen. Je kunt in een paar stappen je eigen site maken en publiceren. Terwijl ik doorklik op Brandon’s site kom ik bij zijn nieuwsbrief terecht. Hier deelt hij zijn gedachten over Atomic Journaling. In een paar stappen kun je met kleine brokjes voorgekookte vragen je eigen gewoonten creëren. Met gebruik van outliner-software zoals Roam Research in Brandon’s geval.
Het zet bij mij weer een luikje open naar mijn gedachten over de hyperlinked nieuwsbrief. Een onderwerp waar ik steeds vaker over denk. Door gebruik te maken van open protocollen moet het mogelijk zijn om outliner-pagina’s nog beter aan elkaar te linken dan alleen een hyperlink. Via Webmentions is het mogelijk om gelinkte bronnen bij elkaar te brengen en zichtbaar te maken. Zo zou deze paragraaf als notitie bij Brandon’s pagina op Substack kunnen verschijnen. Maar je kunt nog verder gaan. Een abonnee van Brandon’s nieuwsbrief kan zich abonneren op de reacties bij een artikel of op de elders gepubliceerde notities. Zoals deze paragraaf. Net als in Hypothesis, de open software om blogposts te annoteren en deze te publiceren op je eigen site. Of denk aan Ton’s ideeën om je boekenlijst te delen als open bestand en te abonneren op andermans leeslijst. Buiten Goodreads of andere gesloten systemen om.
Misschien klinkt dit als oude wijn in nieuwe zakken. We hebben toch hyperlinks. We hebben toch comments. We hebben sociale netwerken waar je conversaties kunt hebben. Dat klopt en dat is fantastisch. Maar dit voelt als een volgende stap. Hyperlinks zijn nog altijd eenrichtingverkeer. Ik link naar een andere site, maar laat dat niet expliciet weten aan de andere site. Terwijl de andere site na validatie prima een link terug kan geven, omdat de extra informatie op mijn site van waarde kan zijn. Ja, ik zie de problemen met spam en robots die tekeer gaan. Dat zijn volgens mij problemen die we kunnen oplossen. De digitale tuinen die we meer en meer zien kunnen zo nog beter met elkaar worden verbonden en er ontstaat een grotere kennisbibliotheek dan we al hebben gemaakt de afgelopen 50 jaar. Geef daarbij de mogelijkheid om over sociale netwerken heen met elkaar te kunnen communiceren vanuit je eigen domein, op je eigen site. En maak de distributie van die verhalen op meerdere manier mogelijk, via RSS, via OPML, via nieuwsbrieven.
Ik draaf misschien een beetje door en de gedachten zijn nog niet allemaal goed uitgelijnd. Ergens in die combinatie van nieuwsbrieven, notities en open protocollen zit een goudader. Ik denk dat we de komende maanden lezen hoe iemand die goudader heeft aangeboord en er praktisch mee aan de slag gaat. Ik kan niet wachten!
Zelfs de On this Day plugin op mijn site geeft een waardevolle link over dit onderwerp.
Wat gebeurde er meer deze week
Sommige threads op Twitter zijn zo vermoeiend dat mijn oogrol-spier in de kramp schiet. Want eerlijk is eerlijk, de start van de thread van Edwin Dorsey belooft weer gouden bergen. Na 6 maanden een betaalde nieuwsbrief haalt hij $ 300.000 per jaar binnen. Maar zijn tips zijn echt wel de moeite waard. Al is het maar voor tip 1: Ben uniek.
⚡️ Substack werkt hard om de schrijvers op hun platform hun beste werk te laten maken. Zoals deze tips om je één-regel-pitch en “over deze nieuwsbrief”-pagina te optimaliseren. Eenvoudig te hergebruiken voor niet-Substackers natuurlijk. Doe er je voordeel mee!
Een mooi kijkje achter de schermen bij de uitgeef-strategie van Follow The Money. “De helft van onze nieuwe leden komt uit de route via de nieuwsbrief. We zien dat de mensen die via de nieuwsbrief ons product leren kennen veel meer de neiging hebben om lang abonnee te worden.“
Tip! Weer zo’n handig lijstje van Ghost: 20 types lead magnets. De gratis spiegeltjes en kraaltjes die sites en makers je aanbieden om later iets aan te schaffen. Ik heb er een haat-liefde verhouding mee. Je weet dat je een tunnel in gaat die je uiteindelijk geld kost maar de gratis snoepjes zijn te verleidelijk…
Uit The Inevitable van Kevin Kelly. Twaalf onvermijdelijke veranderingen in onze toekomst. Met dank aan Readwise
Elke week tips als deze ontvangen?Je inbox zal nooit meer hetzelfde zijn. Met OPEN krijg je tips om je eigen nieuwsbrief en blog nog beter te maken!EmailIk wil die nieuwsbrief! Tof dat je er bij bent ⚡️
Nieuwsbrief van de week
De meest bekende klokkenluider ter wereld Edward Snowden is een nieuwsbrief gestart. Zoals hij zelf zegt: “I want to revive the original spirit of the older, pre-commercial internet, with its bulletin boards, newsgroups, and blogs — if not in form, then in function.” Snowden heeft zowel betaalde als gratis posts, in tekst en audio.
Meer (inter)nationale nieuwsbrieven vind je op Thanks for Subscribing.
<!–
–>
Today at 14:07 it is exactly 19 years ago I published the first post on this blog. Back then I already mention how connecting to others, conversation, is the key thing I’m aiming for. I’ve always been a prolific note maker (going back to primary school even, buying my own notepads). With the launch of my weblog it became a more public thing as well as a means to engage with others.
In recent years I’ve marked the occasion by reflecting on my blogging and practices (see the 18, 17, 16 years edition), and long ago I marked the 3rd and 5th anniversary both extolling the value of the conversations and connections this blog helped create.
This year, as most of last year was spent working from home. It meant a similar internal oriented focus when it comes to my note making and blogging.
I haven’t spend time on IndieWeb community organising for instance, didn’t feel the energy for it either. I did make steps towards making this blog much less dependent on third parties:
I stopped embedding Flickr images in my blog, replacing them with locally hosted copies while linking to the original. Most postings now no longer have Flickr embeds, some 150 still do, which I am slowly bringing down to 0.
I removed all video embeds, replacing them with stills and links
I slowly replaced a number of Slideshare decks, but not all yet. There are no actual slideshare embeds active anymore on my blog, as I deleted my account, but the now non-functional embeds still ‘call’ those web adresses. I’m self-hosting my slides on tonz.nl (Dutch), and tonz.eu (English)
I experimented with sharable bookshelves for my blog, but there’s a connection missing with my internal note taking. I’d very much like to directly generate my book lists and book posts directly from my own notes. I haven’t actually posted about books here since January, a fact I dislike.
That brings me to the note making part. I have completely removed myself from Evernote, replacing it with a local collection of notes in markdown. I’ve kept them separate of the notes collection I actually work with, but import specific notes when I need them. I also, based on an example from fellow Obsidian user Wouter Groeneveld, started scanning my paper notebooks from over the years, creating indexes for them, and thus making them connect to my ongoing work and notes. My use of Obsidian to maintain those markdown notes continues undiminished. The speed of creating new conceptual nodes has slowed a lot, having mined most of my old blogposts for their content. I am now slowly evolving my ways of digesting and adding new knowledge and thoughts. In terms of volume, there are now some 5k notes, of which 1k6 are conceptual, 1k are ‘collected stuff’ with just a few added remarks of why I find them interesting, and some 2k5 work related notes.
In general I would like to see a more direct connection between my notes and my blogging, and ‘wiki’ pages on this site. I’m not sure yet what I’d like so I need to experiment. In the past months I have been contributing to two GitHub hosted sites using Respec, where the site is directly created from my notes. This works really well, but as those are public pages I do keep the corresponding notes in a different place than my ‘real’ notes. I do want to maintain the difference between public and private, as it influences my writing, but I do not necessarily want to keep the public notes in a separate location from the others.
Coincidentally, around note making, I did do some outreach and hosted two ‘Dutch language Obsidian user meet-ups‘. The third is due to take place in two weeks.
For the coming time this note-to-blog pipeline, and making it easier for myself to post, will be my area of attention I think. Let’s see next year around this time, when I hit the two decade mark with this blog, how that went.
How I took notes in 2006, on a locally hosted wiki