After being informed about the intention of the Royal Library to archive my website, I wondered how some of the aspects my site has may affect what is being collected.
Specifically:

  • Most of my postings are kept away from the front page but end up in specific categories. These postings do show up in monthly archives and overview pages like for a tag or category.
  • Some of my postings are unlisted in the site, yet are publicly available. Mostly these are postings I originally only shared through RSS, such as my week notes. They are not in overviews, don’t show up as search results, but have public URLs, and you can navigate to them if you click next / previous post on their surrounding posts in the timeline.

The crawler that will be used for the archiving is Heritrix, which is also used by the Internet Archive itself.
A quick test of some posts from both of the two types above shows they are likely not in the internet archive. I mailed the Royal Library to ask how Heritrix may or may not deal with my site’s quirks. Or perhaps I can generate a complete site map and make that available?

I think I’ll put this up on the front page 😉

Is it possible to annotate links in Hypothes.is that are in the Internet Archive? My browser bookmarklet for it doesn’t work on such archived pages. I can imagine that there are several javascript or iframe related technical reasons for it. An information related reason may be that bringing together different annotations from different annotators is hard, as they might al be annotating a different archived version of the same page.

Yet in some cases this would be very useful to be able to do. For instance, Manfred Kuehns blog was discontinued in 2018, and more recently removed entirely from Blogspot where it was hosted. The archived versions are the only current source for those blogpostings. This means there is no ‘original’ page online anymore to gather the annotations around.

I see Chris Aldrich has annotated posts from that same weblog by Kuehn, maybe he can shed some light on it.