Last week I changed this site to provide better language mark-up. However, even though it changed mark-up correctly, it didn’t solve the issue that made me look into it in the first place: that if you click a link to a posting in my rss-feed, your browser would not detect the right language and translate the posting for you.
As it turns out, Google Translate doesn’t make any real effort to detect the language or languages of a page. It only ever checks if there is a default language indicated in the very first
<html> tag of a page (which my WordPress sets to English for the entire website), and only if there is no such default set it uses a machine learning model (CLD2) to detect what language likely was used, and then only picks the most likely one. It never checks for language mark-up. It also never contemplates if multiple languages were used in a page, even though the machine learning model returns probabilities for more than one language if present in a page.
This is surprising on two levels. One, it disregards usable information even when provided (either the language mark-up, or probabilities from the ML model). Two, it makes an entire family of wrong assumptions, of which that something or someone will always be monolingual is only the first. While discussing this in a conversation with Kevin Marks, he pointed to Stephanie Booth‘s presentation at Google that he helped set up 12 years ago, listing all that is wrong with the simplistic monolingual world-view of platforms and tech silos. A dozen years on it is still all true and relevant, nothing’s changed. No wonder Stephanie and I have been talking about multi-lingual blogging off and on for as long as we’ve been blogging.
Which all goes to say that my previous changes weren’t very useful. I realised that to make auto-translation of clicked links from my feed work, I needed to set the language attribute for an entire page in the
<html> tag, and not try to mark-up only the sections that aren’t in English. (Even if it is the wrong thing to do because it also means I am saying that everything that isn’t content, menu’s, tags etc, are in the declared language. And that isn’t the case. When I write postings in Dutch or German, the entire framework of my site is still in English.). After some web searching, I found a reference to writing a small function to change the default language setting, and calling that when writing the header of a page, which I adapted. The disadvantage is this gets called for every page, regardless if needed (it’s only ever needed for a single post page, or the overview pages of Dutch and German postings). The advantage is, almost all language adaptations are now in a single spot in my theme. I’ve rolled back all previous changes to the single and category templates. Only the changes to the front page template I’ve kept, so that there is still the correct language mark-up around front page postings that are not in English.
The function I added to
functions.php in my child theme.
An example of changed page language setting (to German), for a posting in German. (if you follow that link and do view source, you’ll see it)