Plural and gender rules in internationalized applications

Introduction

When creating applications developer often faces situations when he/she has to display messages in singular or plural version depending on the number he/she deals with. The example sentences could be: “There is 1 file in this folder” and “There are 4 files in this folder”. The common solution is using an indirect message like: “There are 4 file(s) in this folder”. It’s quite good but not good enough for modern applications. It gets even worse when we consider other languages like Polish, Russian or Arabic where there are more than two rules. For example, in Polish language there are four plural forms where a word associated with the number can change in many different ways. Fortunately the rules for each language are clearly classified. The list of all of the plural rules can be found here: http://unicode.org/repos/cldr-tmp/trunk/diff/supplemental/language_plural_rules.html. The next problem is gender rules which is discussed later in this article.

This article describes how to use plural and gender rules when creating Tizen applications and wanting to provide the best user experience. It comes with a sample application that was tested on Tizen SDK 2.2.0. The sample application uses three libraries: MessageFormat, Handlebars and jQuery 2.0.3.

Plural rules

Each language has its own plural forms. The available rule names for plurality are: zero, one, two, few, many and other. Every rule has its definition and it can vary among languages. For example, the rule “few” in Polish language is defined as: any number whose modulo 10 is between 2 and 4, and its modulo 100 is not between 12 and 14. However the “few” rule in Slovak language is defined as: a number between 2 and 4.

Having thse rules’ definitions, we can create a function that gets the number as an argument and decides which plural rule we have to use. The Polish language plural rule function could look like this:

function (n) {
    var int,
        mod10,
        mod100;

    if (n === 1) {
        return 'one';
    }

    int    = n % 1 === 0;
    mod10  = n % 10;
    mod100 = n % 100;

    if (int && ((mod10 >= 2 && mod10 <= 4) && (mod100 < 12 || mod100 > 14))) {
        return 'few';
    }

    if (int && ((mod10 === 0 || mod10 === 1) || (mod10 >= 5 && mod10 <= 9 || mod100 >= 12 && mod100 <= 14))) {
        return 'many';
    }

    return 'other';
};

The English plural rule function is simpler:

function (n) {
    if (n === 1) {
        return 'one';
    }

    return 'other';
};

Having these plural rules functions we should prepare our messages for each language for every rule. It could look like this:

English:

  • one - “There is 1 file in this folder”
  • other - “There are # files in this folder”

Polish:

  • one - “W tym folderze jest 1 plik
  • few - „W tym folderze # pliki
  • many - „W tym folderze jest # plików
  • other - „W tym folderze jest # pliku

Later in this article, I will describe how to write these different message versions in one string and not to repeat the entire sentence over and over again when they differ only in few words.

Gender rules

Another case when we can have different message versions is when we’re dealing with gender. Let’s take this message as an example: “Caroline added two friends to her friends list”, “Thaddeus added two friends to his friends list”.

In English language the situation is quite simple. In this example, message differs in the name and pronoun. In other languages, it can be more complex. In Polish language, for example, in past tense a verb can change depending on gender.

Gender rule is simple. When we deal with male, the rule is male. When we deal with female, the rule is female. When we can’t determine gender, the rule is other.

Later, in this article I will describe how to write different messages for different genders in one string and how to combine it with various versions of a message for plural rules.

messageformat.js library

Fortunately, you don’t have to write all the plural rules and the logic behind it. There is a free library that does the hard part and leaves you only with the task to write different sentences for different rules. You can find this library and its documentation here https://github.com/SlexAxton/messageformat.js.

You have to add the library in the head section of your application:

<script type="text/javascript" src="js/messageformat.js"></script>

Below is an example string that contains our message taking plural and gender rules into account. Later in this section, I will describe how it was composed. For now, let’s focus on how to use it.

{NAME} added {FRIENDS_NUM, plural,
                 one {1 friend}
               other {# friends}
             } to {GENDER, select,
                 male {his}
               female {her}
                other {their}
             } friends list

The simplest way to use it is to create a MessageFormat object and pass locale of the language as the first parameter. Then we have to compile our message string. It returns a function that can be used to format our messages.

var message = '{NAME} added {FRIENDS_NUM, plural, one {1 friend} other {# friends}} to {GENDER, select, male {his} female {her} other {their}} friends list';

var mf = new MessageFormat('en');
var compiled = mf.compile(message);

compiled({
    NAME: 'Caroline',
    FRIENDS_NUM: 2,
    GENDER: 'female'
});

We passed an object containing all the data needed to correctly format string to the compiled() function. If one of the parameters is omitted than exception will be thrown.

It’s not a good practice to compile message every time it’s needed so the best thing is to cache compiled version of message. You can do it in build time or just save it in cache after the first compilation and later reuse it.

So, how to compose this kind of message? The rule definition looks like this:

{VARIABLE_NAME, rule_type, options}

The VARIABLE_NAME is the name of object’s property that will be passed to the compiled message function. The rule_type relates to rule you want to use. For plural rule you have to use plural word, for gender rule use select word. The next parameter is options list which depends on the rule used. For both rules, syntax is the same:

rule_name_1 {rule_message_1} rule_name_2 {rule_message_2} rule_name_3 {rule_message_3}

For plural rule, rule_name can be one of the six plural rules: zero, one, two, few, many, other.

For gender rule, rule_name can be one of the three gender rules: male, female, other.

The rule_message is the actual message or part of the message that corresponds to the given rule. When writing a particular message for plural rules you can use hash (#) to insert number in appropriate place in the sentence. Let’s consider this example:

You have {NUM, plural,
    one {1 message}
  other {# messages}
}

If you pass 1 as a value of the NUM variable the output will be: You have 1 message. If you pass 5, you will get: You have 5 messages. Now let’s try a message with gender rule:

That's {GENDER, select,
           male {his}
         female {her}
          other {their}
       } picture.

If you pass male as a value of the GENDER variable the output will be: That’s his picture. You can easily deduce the outcome for the other cases.

You can also insert variable alone just by typing {VARIABLE_NAME} without any type or options:

{NAME} joined the conversation

Now, when you pass, for example, Lukas as a value of the NAME variable you will get output: Lukas joined the conversation.

You can also nest one rule into another if more complex behavior is needed.

{GENDER1, select,
    male {}
  female {}
   other {{NUM, plural,
              one {1}
            other {{GENDER2, select,
                       male {}
                     female {}
                      other {}
                  }}
         }}
}

The sample application

The sample application demonstrates how to use MessageFormat.js library in practice. You can see screenshot from the application on the picture below.

Message Format screenshot

Screenshot from the application

The application displays a few messages in three languages: English, Polish and Russian. It’s divided in two main areas. The top one is the form in which you can set your name and gender. You can also define the number of notifications and the language in which messages will be displayed.

The bottom area displays the actual messages. It shows how many applications are installed on the device, how many contacts are in the address book, how many calls were made and the last one is just simple a message that takes number from field Notifications count.

Some data that are displayed need special permissions in config.xml file.

<tizen:privilege name="http://tizen.org/privilege/contact.read"/>
<tizen:privilege name="http://tizen.org/privilege/callhistory.read"/>

Application uses two JavaScript libraries besides MessageFormat.js. They’re jQuery v. 2.0.3 and Handlebars.js. Handlebars library is used to display the template and to separate presentation layer from logic layer. It was described in this article. jQuery was used only to handle events and do some basic DOM manipulations. MessageFormat.js library needs more than one file to work. Apart from the main js/lib/messageformat.js file it needs plural rules definitions that are located in the js/lib/locales/ directory. There’re three files for three languages: English (en.js), Polish (pl.js) and Russian (ru.js).

The application logic resides in js/main.js file. Let’s discuss some more important parts of this file. At the beginning we declare default locale (language) and create instance of MessageFormat, which takes locale as an argument. We’re also declaring and filling with default values an object that will be sent to the template. We will discuss it in detail later in this section.

locale = 'pl';
mf = new MessageFormat(locale);
data = {
    name: 'Noname',
    gender: 'male',
    appsCount: 0,
    contactsCount: 0,
    callsCount: 0,
    notificationsCount: 0
};

Later in this file we have a dictionary declaration (I’ve trimmed translations, to see the whole dictionary go to the main.js file). In real application the best way to make a dictionary would be putting it in an external file and loading it on application’s initialization. You could also compile it and store that data in the file to avoid runtime compilation.

dictionary = {
    en: {
        "Tizen found APPS_COUNT applications.":                      "...",
        "NAME added CONTACTS_COUNT contacts to their address book.": "...",
        "You've made CALLS_COUNT calls.":                            "...",
        "You have NOTIFICATIONS_COUNT notifications.":               "..."
    },
    pl: {
        "Tizen found APPS_COUNT applications.":                      "...",
        "NAME added CONTACTS_COUNT contacts to their address book.": "...",
        "You've made CALLS_COUNT calls.":                            "...",
        "You have NOTIFICATIONS_COUNT notifications.":               "..."
    },
    ru: {
        "Tizen found APPS_COUNT applications.":                      "...",
        "NAME added CONTACTS_COUNT contacts to their address book.": "...",
        "You've made CALLS_COUNT calls.":                            "...",
        "You have NOTIFICATIONS_COUNT notifications.":               "..."
    }
};

There are three functions in tasks object that get some data from the device: installedApps(), addedContacts(), callsCount(). render() function gets template data, compiles it and fills that template with data. Rendered template is put inside DIV element with ID attribute equal "stats".

render = function () {
    var source;

    if (template === undefined) {
        source = $('#template').html();
        template = Handlebars.compile(source);
    }

    $('#stats').html(template(data));
};

Every time the data in the form changes, the template is rerendered. The crucial part of rendering the template is Handlebars helper called i18n that allows using MessageFormat library inside Handlebars templates.

Handlebars.registerHelper('i18n', function (text) {
    var options,
        compiledText;

    options  = arguments[arguments.length - 1];

    if (compiled[locale].hasOwnProperty(text)) {
        compiledText = compiled[locale][text];
    } else {
        compiledText = mf.compile(dictionary[locale][text]);
        compiled[locale][text] = compiledText;
    }

    return compiledText(options.hash);
});

What actually does this helper do? It introduces construction with following syntax:

{{i18n "message_name" VARIABLE1=value1 VARIABLE2=value2}}

It gets message with the name message_name from our dictionary and passes it some variables. This helper gets our MessageFormat object (for given locale) and compiles the message if it hasn’t been done before. This compiled message is used to display the formatted text.

Handlebars.registerHelper('i18n', function (text) {
    var options,
        compiledText;

    options  = arguments[arguments.length - 1];

    if (compiled[locale].hasOwnProperty(text)) {
        compiledText = compiled[locale][text];
    } else {
        compiledText = mf.compile(dictionary[locale][text]);
        compiled[locale][text] = compiledText;
    }

    return compiledText(options.hash);
});

Of course you could use MessageFormat library directly but it’s not a good practice to mix the presentation layer with the logic layer.

Summary

I hope that this article helped you understand what plural and gender rules are. Thanks to this you should be able to create better applications that display messages that are more humane.

첨부 파일: