Last week I changed this site to provide better language mark-up. However, even though it changed mark-up correctly, it didn’t solve the issue that made me look into it in the first place: that if you click a link to a posting in my rss-feed, your browser would not detect the right language and translate the posting for you.

As it turns out, Google Translate doesn’t make any real effort to detect the language or languages of a page. It only ever checks if there is a default language indicated in the very first <html> tag of a page (which my WordPress sets to English for the entire website), and only if there is no such default set it uses a machine learning model (CLD2) to detect what language likely was used, and then only picks the most likely one. It never checks for language mark-up. It also never contemplates if multiple languages were used in a page, even though the machine learning model returns probabilities for more than one language if present in a page.

This is surprising on two levels. One, it disregards usable information even when provided (either the language mark-up, or probabilities from the ML model). Two, it makes an entire family of wrong assumptions, of which that something or someone will always be monolingual is only the first. While discussing this in a conversation with Kevin Marks, he pointed to Stephanie Booth‘s presentation at Google that he helped set up 12 years ago, listing all that is wrong with the simplistic monolingual world-view of platforms and tech silos. A dozen years on it is still all true and relevant, nothing’s changed. No wonder Stephanie and I have been talking about multi-lingual blogging off and on for as long as we’ve been blogging.

Which all goes to say that my previous changes weren’t very useful. I realised that to make auto-translation of clicked links from my feed work, I needed to set the language attribute for an entire page in the <html> tag, and not try to mark-up only the sections that aren’t in English. (Even if it is the wrong thing to do because it also means I am saying that everything that isn’t content, menu’s, tags etc, are in the declared language. And that isn’t the case. When I write postings in Dutch or German, the entire framework of my site is still in English.). After some web searching, I found a reference to writing a small function to change the default language setting, and calling that when writing the header of a page, which I adapted. The disadvantage is this gets called for every page, regardless if needed (it’s only ever needed for a single post page, or the overview pages of Dutch and German postings). The advantage is, almost all language adaptations are now in a single spot in my theme. I’ve rolled back all previous changes to the single and category templates. Only the changes to the front page template I’ve kept, so that there is still the correct language mark-up around front page postings that are not in English.


The function I added to functions.php in my child theme.


An example of changed page language setting (to German), for a posting in German. (if you follow that link and do view source, you’ll see it)

[rant] Increasingly, in the contexts I operate in, I feel the distinction between data and information is something of a pre-digital pre-networked hang-up. Yes there’s a difference between e.g. measurements (-1, 0, 1, 2, 4) and an informative conclusion drawn from it (the world’s getting hotter), but in the common perception of both data and information as objects, there isn’t much useful distinction anymore between a database and a document. When digitised, they’re both objects that can be either, as it is in the eyes of the beholder and their use case. Context as always is key. If it was used as data, it was. If the same thing was used as information, it was. (An example is the European Commission’s documents. Information to most of us, but data for Google’s translation algorithms as its the largest body of text on the planet carefully translated into 23 languages)

There is often a difference in difficulty of processing it with machines, yes. Most what is called information in that sense is badly packaged badly marked-up data to machines. Structured data with meta-data and expressed relations (linked data e.g.) in that sense are large documents hard to read for human eyes. But is there any practical gain in terms of agency by making the distinction between data and information, in the context of digital processes? You can make a distinction between a datum (’42’) and a collection of that datum with more of it or other stuff (‘The Hitchhiker’s Guide to the Galaxy’). But a singular datum on its own is not what ever happens in real use cases where we discuss data and information as separate objects. As a pragmatist, I find I’ve mostly dropped the distinction.

Oh and please don’t extend the data-information sequence to data-information-knowledge-wisdom. The 1970’s DIKW model’s been the CS/IS mantra for decades, but there is no linearity or hierarchy between those four terms, and the implication the latter two are objectifiable is actively destructive. The D-I part served once to help explain how data was a strategic resource, which is still a very valid proposition, more than ever even as data is a geo-political factor now, but don’t assume a wider purpose of the model than that.

[/rant]

For a while I had been getting warning messages ‘this software isn’t optimised for your Mac and needs to be updated’. It’s Orwellian for ‘we’ve made updates to the MacOS and that wrecks some of your software because we’ve declared it obsolete and legacy’. I hadn’t yet figured out what precisely was at issue, but Peter posts about how The 64 Bit MacOS Apocalypse Has Arrived.

He helpfully says where to find your list of software being declared legacy with the update to MacOS Catalina.

You can get a list of all of what Apple now refers to as “legacy software” by clicking the Apple menu, then About This Mac, System Report, and, finally, Legacy Software in the left-hand sidebar: this will show you all the 32 bit applications currently installed on your Mac that won’t run if you update to Catalina.

Here’s my list:

Some of the things on it don’t worry me, such as Citrix, as I’m sure it will get updated, and I don’t currently have clients for which I need it. Similarly the Flickr Uploader is a useless tool, which I can uninstall without consequences to my workflow. I suspect Scrivener will get updated soon enough.

Scansnap does worry me. It’s what drives my extremely useful scanner with feeder. A 450 Euro machine, that is already a number of years old and no longer on the market. So I need to find out if they will update their drivers and software. Otherwise I’m left with a key piece of expensive hardware at home that doesn’t work on my laptop. [UPDATE Fujitsu will not release new software for my scanner S1500M, as its ‘support period has expired
So no MacOS updates for me for now, as it would junk a highly capable piece of expensive hardware that is a key tool for my work.]

Google’s Chrome is not a browser, it’s advertisement delivery software. Adtech after all is where their profit is. This is incompatible with Doc SearlsCastle doctrine of browsers, so Chrome isn’t fit for purpose.

Removing Chrome
image by Matthew Oliphant, license CC BY ND

Read Chrome to limit full ad blocking extensions to enterprise users – 9to5Google (9to5Google)

Google shared that Chrome’s current ad blocking capabilities for extensions will soon be restricted to enterprise users. SEC filing: “New and existing technologies could affect our ability to customize ads and/or could block ads online, which would harm our business.”

Because every other release is a memory black hole that makes my laptop lift off from the madly spinning ventilators 60 seconds after launching the browser. It’s been like that for a decade or more. That is why I’m not using FF, and on Opera as default browser (as well as TOR).

Replied to It’s #Foxtober. Why aren’t you using Firefox? by Charlie OwenCharlie Owen

It’s #Foxtober. Why aren’t you using Firefox? Seriously, why not? It’s not run by a giant corporation. It’s got a cool community. It promotes browser diversity.

I use the WP Plugin Post Kinds here, which lets me blog things like Replies, Likes, etc. This plugin has a setting that determines the order in which my own remarks with a Reply or Like and the thing I am replying to or liking are shown.

The default order is [the thing I respond to] [my response], but here in this blog I have changed that, because I like to have my own response first. This ensures for instance that my own words, and not someone else’s get posted to Twitter if I share my post directly to Twitter.

This setting does not change the way the same blogpost gets added to the RSS feed. This means that my regular readers do not get the content of a posting as I intend it, which is in the same order as a website visitor.
In addition it causes anything that consumes my feed, such as my Micro.blog account to show the post I am responding to first (someone else’s words) and not my remarks. Below in three images is how that looks in practice:

The old version: the order is as I want it on the site.

The old version: the order is reversed for the same item in my feed

Micro.blog posts from my feed, and therefore shows not my words first but the words I’m reacting to, which makes them appear as if they are my words

I figured out where in the plugin files (in class-kind-view.php) the feed gets created and how it is different from how the posting is created for the site. Then I added the conditional code from the latter to the former. This works on my site, as shown by the following three images:

Testing the new code: on my site the item is in the right order

In the RSS feed, the content of the item is now in the right order too

And the right order now shows up in Micro.blog, showing my own words first

Then I tried to let the creator of the WP Plugin know I made this change, through a Pull Request on GitHub. I’v never done this before. It’s basically a message ‘I changed this file here’ which the original creator can then adopt in the original code. Making that message meant engaging with concepts such as forks, branches, commits and then the Pull Request. I think I pulled if off, but I will only know when David Shanske, who makes Post Kinds indeed incorporates it in the plugin.

Hoping I’ve submitted my first ever PR the right way