Last week I changed this site to provide better language mark-up. However, even though it changed mark-up correctly, it didn’t solve the issue that made me look into it in the first place: that if you click a link to a posting in my rss-feed, your browser would not detect the right language and translate the posting for you.

As it turns out, Google Translate doesn’t make any real effort to detect the language or languages of a page. It only ever checks if there is a default language indicated in the very first <html> tag of a page (which my WordPress sets to English for the entire website), and only if there is no such default set it uses a machine learning model (CLD2) to detect what language likely was used, and then only picks the most likely one. It never checks for language mark-up. It also never contemplates if multiple languages were used in a page, even though the machine learning model returns probabilities for more than one language if present in a page.

This is surprising on two levels. One, it disregards usable information even when provided (either the language mark-up, or probabilities from the ML model). Two, it makes an entire family of wrong assumptions, of which that something or someone will always be monolingual is only the first. While discussing this in a conversation with Kevin Marks, he pointed to Stephanie Booth‘s presentation at Google that he helped set up 12 years ago, listing all that is wrong with the simplistic monolingual world-view of platforms and tech silos. A dozen years on it is still all true and relevant, nothing’s changed. No wonder Stephanie and I have been talking about multi-lingual blogging off and on for as long as we’ve been blogging.

Which all goes to say that my previous changes weren’t very useful. I realised that to make auto-translation of clicked links from my feed work, I needed to set the language attribute for an entire page in the <html> tag, and not try to mark-up only the sections that aren’t in English. (Even if it is the wrong thing to do because it also means I am saying that everything that isn’t content, menu’s, tags etc, are in the declared language. And that isn’t the case. When I write postings in Dutch or German, the entire framework of my site is still in English.). After some web searching, I found a reference to writing a small function to change the default language setting, and calling that when writing the header of a page, which I adapted. The disadvantage is this gets called for every page, regardless if needed (it’s only ever needed for a single post page, or the overview pages of Dutch and German postings). The advantage is, almost all language adaptations are now in a single spot in my theme. I’ve rolled back all previous changes to the single and category templates. Only the changes to the front page template I’ve kept, so that there is still the correct language mark-up around front page postings that are not in English.


The function I added to functions.php in my child theme.


An example of changed page language setting (to German), for a posting in German. (if you follow that link and do view source, you’ll see it)

My site until now didn’t indicate very well in which language my postings are written. I write here mostly in English, but also sometimes use two other languages, Dutch and German.

My friend Peter pointed out to me that if he reads Franks blog in his feedreader and clicks on the link his browser automatically translates it into English. As Peter suggested, this is most likely because Frank’s site declares Dutch as its language, and mine declares English. I decided to look into it and see if I could change that.

The language declaration Peter pointed to is the very first statement in the source code for this page:

Frank’s site in the same space says his site is in Dutch.

Frank also publishes in English sometimes, and then the language setting would be factually incorrect. Peter just wouldn’t notice as he wouldn’t attempt to translate English, his native language.

My company’s website in contrast declares three languages, by giving a different url for English and German, next to the regular Dutch. However in this case it is about the same or similar pieces of content made available in different languages. Which is not the same use case as my blog, where there is different content in different languages.

I concluded I needed to figure out how to a) for the category archive pages for Dutch and German postings declare the right language (because I mark any posting not in English with a separate category corresponding to its language), and b) for individual postings not in English declare the right language.

First I looked at what the W3C says about indicating content languages. It turns out Frank and I both do it right, the html statement is the place to declare the default language of a website. In Frank’s case Dutch, in my case English. The W3C goes on to say that any other languages should be indicated in the location where they are used. This e.g. would allow me to indicate the correct language even if I use a non-English phrase in the middle of an otherwise English text, hetgeen een mooie oplossing is voor automatische vertaalsoftware. Which looks like this in html:

This means that what I needed to do was for the category archive pages for Dutch and German, as well as for individual postings, find the right spot in the source of a page to declare the correct language. I did this in the WordPress Theme I am using, or rather in the child theme (which allows you to specify any deviations from the original theme, while keeping the rest of the theme as it was).

For both the Dutch and German category pages I created separate templates, called category-nederlands.php and category-deutsch.php, which corresponds with the name of the category in my WordPress instance. At the top of those pages I added a language indicator where the main part of the page starts.

For individual blogposts it is a bit more difficult, as you need to be able to determine first if a posting is in another language than English. I adapted the single.php template, which renders individual postings. There I added a line of code to see if the posting is in Dutch or German, by checking if it is in the corresponding category.

This results in either adding lang=”nl-nl” or lang=”de-de” to postings in those languages, in the same location as for the category archive pages shown above.

Hopefully this now allows browsers to correctly detect the language of content on my site.
I’m not entirely done yet. Because in some overviews, like the front page, individual postings that are not in English are not rightly marked with the correct language yet. Only if you go to that posting itself, will the language be correctly set. But this can be solved in a similar way, I assume. [UPDATE 2019-10-14] I’ve also edited the index.php and category.php templates to check if a posting is in the Dutch or German language category, and add a language declaration using a <div lang="nl-nl"> around the posting. For the index.php I do that only for the home page. This works, but as far as I can tell e.g. Google Translate for ‘detect language’ only checks the default language of a page. As I am not here to facilitate Google, I am currently satisfied that I at least do now provide clear meta-data about the language of postings I publish.[/UPDATE]
A final step I’d like to add is automatically insert machine translation links into my rss feed items, although I’m still not entirely sure that would be useful.

Pleased to see that my step last week to fix my RSS feed so it shows my words first, not what I’m reacting to, actually has the hoped for effect.

My feed goes to my micro.blog account, and there my own words are shown now first too. A post in this new form yesterday created a nice conversation involving 4 others on micro.blog. That would not have happened had my post started and ended with “Read: some url”.

I use the WP Plugin Post Kinds here, which lets me blog things like Replies, Likes, etc. This plugin has a setting that determines the order in which my own remarks with a Reply or Like and the thing I am replying to or liking are shown.

The default order is [the thing I respond to] [my response], but here in this blog I have changed that, because I like to have my own response first. This ensures for instance that my own words, and not someone else’s get posted to Twitter if I share my post directly to Twitter.

This setting does not change the way the same blogpost gets added to the RSS feed. This means that my regular readers do not get the content of a posting as I intend it, which is in the same order as a website visitor.
In addition it causes anything that consumes my feed, such as my Micro.blog account to show the post I am responding to first (someone else’s words) and not my remarks. Below in three images is how that looks in practice:

The old version: the order is as I want it on the site.

The old version: the order is reversed for the same item in my feed

Micro.blog posts from my feed, and therefore shows not my words first but the words I’m reacting to, which makes them appear as if they are my words

I figured out where in the plugin files (in class-kind-view.php) the feed gets created and how it is different from how the posting is created for the site. Then I added the conditional code from the latter to the former. This works on my site, as shown by the following three images:

Testing the new code: on my site the item is in the right order

In the RSS feed, the content of the item is now in the right order too

And the right order now shows up in Micro.blog, showing my own words first

Then I tried to let the creator of the WP Plugin know I made this change, through a Pull Request on GitHub. I’v never done this before. It’s basically a message ‘I changed this file here’ which the original creator can then adopt in the original code. Making that message meant engaging with concepts such as forks, branches, commits and then the Pull Request. I think I pulled if off, but I will only know when David Shanske, who makes Post Kinds indeed incorporates it in the plugin.

Hoping I’ve submitted my first ever PR the right way

It’s rather cool to see Neil adopting parts of my information strategies. Looking forward to reading more about how it plays out for him. There were several interested during last weekend’s IndieWebCamp too. Having more perspectives on this approach may help to formulate a more generic description of this process.

Liked a post by Neil MatherNeil Mather

Started with a really simple version of Ton’s infostrat and liking it already….The best part is avoiding anything that has an endless stream of fairly random (but tantalisingly, possibly interesting) stuff….. I’m feeling more intentional, less flighty of attention.