off the stack

software translation: real messages vs keyword messages

2014-04-27

Translation systems are a vital part of software as they enable us to create programs which are accessible for a broader audience. Paired with other tools (e.g. locale based number and date formatting), they aid internationalization of software. In case you want to read more about "i18n" itself, I recommend this mozilla article about i18n.

It can be quite difficult to correctly translate an application as it can require quite some linguistic knowledge. Furthermore, assumptions about the visible text within the UI during the interface design phase can be problematic too (e.g. text/word length, where and in which order values are injected into the translation).

While those issues are fairly high level and UI/UX related, translation systems are normally integrated within our application's core. To use them, you often invoke them using a "source string" as a parameter directly inside the source code. They often translate the source string into the configured target language using a predefined mapping often read from a set of translation files or a database. The translation is then returned and used within the application.

There are two common types of translation definitions (besides the various translation file formats), "keyword messages" and "real messages". They differ in what the source string looks like/is about.

With "real messages", the source strings contain whole sentences (and occasionally also some markup language artifacts). Real messages are often written in plain english and are therefore often directly suitable for end users capable of reading the source language. A real message could be something as simple as "Click here to login." but could also contain several lines of copy.

"keyword messages" on the other hand are defined by using identifiers as source strings. Those identifiers are short strings which are not directly suitable for end users as they merely name what they refer to and where they belong inside the application. An example of a keyword message could be "login_form.login_button_label".

Even though real messages seem to be used more often, I would argue, that they are a messier way to handle translations.

Why? Simply because they mix presentation with business logic and are the less semantically correct way to handle translations IMHO.

Of course, while coding up a model class for example, or implementing some form logic it's too easy to just type out the labels of the fields as they should appear on the screen right there in your code. Without adding extra translation stuff you normally have some useful texts immediately. Then, in case translations are needed, you just wrap those english texts throughout your code into calls to your translation system (e.g. _()). Furthermore, everyone else does that ... why do it differently?

Well, where did you put those user visible strings again? Right inside your business logic. The text visible to the end user is clearly presentation related, yet it is now inside the guts of your application. For labels that may not seem such a problem, but think of descriptions, help texts, extra validation information... For example, think of a description for a HTML form's input field, and you are required to make some things bold or underlined. You would have to add some (probably HTML) markup inside the business logic too.

Then, after a while, a typo is found within the application's interface while using the default interface language. Where would you have to adjust that typo? Within the translation file? Within the business logic? Or for cleanliness both? You want a clean solution, don't you? But then you would have to adjust all other translation files too...?

That the business logic related files need adjustments when a part of the UI has to be adjusted means that they are coupled.

If keyword messages were used, only the translation files would have to be changed, as the text (which had a typo in the example) is defined within the translation files only. The business logic only contains identifiers which merely point to the translation. Our user visible strings are neatly separated. In case extra markup would be needed within the translation (yuk), it would not litter our code. The business logic stays clean and concise.

Besides that, things that might receive the same translation (as often the case with various UI buttons) can be required to have distinct translations later on. The login button on the login form just isn't the same as the login button within the applications header. The name field of a product model just is not the same as the name field of a category model. They all appear in different parts of the application, so distinct keyword message based translations are more appropriate (even though it requires some extra work). I would rather give them distinct keyword message based labels explicitly in the first place than touch my business logic related files when some part of the UI needs to change.