My site until now didn’t indicate very well in which language my postings are written. I write here mostly in English, but also sometimes use two other languages, Dutch and German.
My friend Peter pointed out to me that if he reads Franks blog in his feedreader and clicks on the link his browser automatically translates it into English. As Peter suggested, this is most likely because Frank’s site declares Dutch as its language, and mine declares English. I decided to look into it and see if I could change that.
The language declaration Peter pointed to is the very first statement in the source code for this page:
Frank’s site in the same space says his site is in Dutch.
Frank also publishes in English sometimes, and then the language setting would be factually incorrect. Peter just wouldn’t notice as he wouldn’t attempt to translate English, his native language.
My company’s website in contrast declares three languages, by giving a different url for English and German, next to the regular Dutch. However in this case it is about the same or similar pieces of content made available in different languages. Which is not the same use case as my blog, where there is different content in different languages.
I concluded I needed to figure out how to a) for the category archive pages for Dutch and German postings declare the right language (because I mark any posting not in English with a separate category corresponding to its language), and b) for individual postings not in English declare the right language.
First I looked at what the W3C says about indicating content languages. It turns out Frank and I both do it right, the html statement is the place to declare the default language of a website. In Frank’s case Dutch, in my case English. The W3C goes on to say that any other languages should be indicated in the location where they are used. This e.g. would allow me to indicate the correct language even if I use a non-English phrase in the middle of an otherwise English text, hetgeen een mooie oplossing is voor automatische vertaalsoftware. Which looks like this in html:
This means that what I needed to do was for the category archive pages for Dutch and German, as well as for individual postings, find the right spot in the source of a page to declare the correct language. I did this in the WordPress Theme I am using, or rather in the child theme (which allows you to specify any deviations from the original theme, while keeping the rest of the theme as it was).
For both the Dutch and German category pages I created separate templates, called category-nederlands.php and category-deutsch.php, which corresponds with the name of the category in my WordPress instance. At the top of those pages I added a language indicator where the main part of the page starts.
For individual blogposts it is a bit more difficult, as you need to be able to determine first if a posting is in another language than English. I adapted the single.php template, which renders individual postings. There I added a line of code to see if the posting is in Dutch or German, by checking if it is in the corresponding category.
This results in either adding lang=”nl-nl” or lang=”de-de” to postings in those languages, in the same location as for the category archive pages shown above.
Hopefully this now allows browsers to correctly detect the language of content on my site.
I’m not entirely done yet. Because in some overviews, like the front page, individual postings that are not in English are not rightly marked with the correct language yet. Only if you go to that posting itself, will the language be correctly set. But this can be solved in a similar way, I assume. [UPDATE 2019-10-14] I’ve also edited the index.php and category.php templates to check if a posting is in the Dutch or German language category, and add a language declaration using a <div lang="nl-nl">
around the posting. For the index.php I do that only for the home page. This works, but as far as I can tell e.g. Google Translate for ‘detect language’ only checks the default language of a page. As I am not here to facilitate Google, I am currently satisfied that I at least do now provide clear meta-data about the language of postings I publish.[/UPDATE]
A final step I’d like to add is automatically insert machine translation links into my rss feed items, although I’m still not entirely sure that would be useful.
Also see Adding Better Language Support II and Adding Better Language Support III
@ton This was very informative; thanks for writing it up!
Although I think you’ve done the right thing, in my particular case, which is full blog posts opened up in a Google Chrome window from my RSS reader on my mobile device, it doesn’t seem to have changed anything.
It looks like Chrome’s automatic offer to translate, on both desktop and mobile, is based on the language declaration in the <html> element, and that it ignores different language declarations nested more deeply.
Indeed it seems Google only uses the default language in the <html> element. When I tried it with translate.google.com using ‘detect language’ it detected English, even before hitting the link to show the site and said ‘this site is in English’. Disappointing, but it may well be that declaring language granulary, which is the only possibility for multilingual content, isn’t too widespread, too little at least to check for it.