Core Java

Internationalization in Java

In this article, we are going to explain Internationalization in Java.

1. What are Internationalization and Localization?

Internationalization in Java

Internationalization, or I18N for short, is the broad term for the techniques and processes involved in building applications that are easily adaptable to different cultural norms and/or preferences. The process of actually adapting an application to a particular set of cultural norms is localization (or L10N).

2. Locales

Central to the Java model of internationalization is the idea of a locale, which Unicode Technical Standard #35 defines as “…an identifier (id) that refers to a set of user preferences that tend to be shared across significant swaths of the world.” The main user preference associated with a locale is, perhaps not surprisingly, language, but locales also encompass a variety of other often-shared preferences as well. Some of these, such as the formatting of numbers, dates, and times, tend to be tied more or less tightly to the language, Locales can even include additional preferences that are either completely extra-linguistic or not strictly language-linked, such as calendar usage, or numeral style.

3. The java.util.Locale class

In Java, locales are represented by instances of the java.util.Locale class. From Java SE 7 on, the Locale class implements the concept of a language tag as defined by the IETF standard BCP 47 A BCP 47 language tag is a string consisting of one or more subtags, each no more than eight characters long and separated by hyphens, in the following order:

  • (Primary) Language: two or three letters, normally in lowercase; the broadest classification of a language; examples: ar (Arabic), zh (Mandarin Chinese), en (English);
  • Script: four letters, normally in Title Case; identifies a specific writing system; examples: Cyrl (Cyrillic script), Latn (Latin script), Hans (Simplified Chinese characters);
  • Region: a country (two letters, normally in UPPERCASE) or other geographic area (three digits); Examples: CA (Canada), JP (Japan), 419 (Latin America). For historical reasons, the Locale class uses the term “country” when referring to the BCP 47 region subtag;
  • Variant: significant additional variants not adequately identified by some combination of primary language, script, and region; minimum 5 characters if it starts with a letter, four if it starts with a digit. Examples: VALENCIA (Valencian dialect of Catalan), 1901 and 1996 (dates of German spelling reforms)
  • Extensions: additional locale-related information which is cannot be captured by some combination of Language, Script, Region, or Variant subtags. Each extension is introduced by a one-character subtag (called a singleton) and consists of all the following subtags that are larger than one character. The extension type of most relevance to Java applications is the Unicode Locale Extension, identified by the singleton 'u'. The Unicode Locale extension has the following subcomponents:
    • Unicode locale attributes: identifiers (3-8 characters) for boolean (true/false) locale properties;
    • Unicode locale keywords: each is a key/type pair of subtags; the key (2 characters), names a multi-valent locale property and the type (3-8 characters) gives the property value. Example: ca-japanese; the key is ca (calendar), the type is japanese (Japanese Imperial calendar).

Here’s an example of a language tag:

ja-JP-u-ca-japanese

It breaks down like this:

SubtagValueMeaning
LanguagejaJapanese
Script(omitted)Japn, Japanese Script (implied)
RegionJPJapan
Variant(omitted)None required
Extension
singleton
uUnicode Locale extension
Unicode Keyword keycaCalendar usage
Unicode Keyword typejapaneseJapanese Imperial calendar
Breakdown of language tag ja-JP-u-ca-japanese

In spite of the convention of writing language subtags in lowercase, Script subtags in Title Case, and Region subtags in UPPERCASE, Locale components and language tags are always treated as case-insensitive. For example, CA, Ca, cA, and ca all represent either Catalan (when used as a Language subtag), or Canada (when used as a region subtag).

4. Obtaining Locales

An important part of the logic of an internationalized application involves simply obtaining appropriate Locale instances passing them around to various locale-sensitive services that perform. You have a variety of options for obtaining the Locale objects you need.

4.1 Default Locales

The JDK establishes one or more default locales, based on the configuration of the host environment in which your application is running. Some locale-sensitive methods implicitly use a default locale instead of an explicit Locale argument, but you can also obtain references to default Locales and use them just as you would any other Locale:

DemoLocale.java

        Locale defaultLocale = Locale.getDefault();
        Locale displayDefaultLocale =
            Locale.getDefault(Locale.Category.DISPLAY);
        Locale formatDefaultLocale =
            Locale.getDefault(Locale.Category.FORMAT);

Historically, JDK has always provided (and continues to provide) a single “anonymous” default Locale. But with newer operating systems offering support for multiple locales to be configured for different uses, Java SE 7 added support for two named categories of default locale: DISPLAY and FORMAT. The default DISPLAY locale typically applies to textual components of the application UI, while the default FORMAT locale applies to the formatting of individual numbers, dates, and times. The categories are identified by member constants of the enum class Locale.Category, so that new categories could easily be added in the future if needed (as of Java 17, however, no new categories have been added).

4.2 Constructors

Locales needing only language, country/region, or variant subtags can be constructed:

DemoLocale.java

        Locale frenchLocale = new Locale("fr");
        Locale brazilianPortuguese = new Locale("pt", "BR");
        Locale valencianCatalan = new Locale("ca", "ES", "VALENCIA");

However, the Locale constructors are holdovers from pre-BCP 47 days when locales only had language, country, and variant components. In order to create Locales that make use of all BCP 47 features, including Script subtags and extensions, you need to use either the Locale.Builder API or the forLanguageTag() factory method.

4.3 Locale.Builder

Locale.Builder provides a fluent API that lets you build BCP 47 well-formed Locale instances programmatically from their component subtags. The Builder API lets you use all available Locale features, including Script subtags and Unicode Locale Extension subtags. Here are some examples:

DemoLocale.java

        // Serbian language (Montenegro), Cyrillic script
        Locale serbian = new Locale.Builder()
            .setLanguage("sr")
            .setScript("Cyrl")
            .setRegion("ME")
            .build();

        // Japanese language (Japan), Imperial calendar
        Locale japaneseWithImperialCalendar = new Locale.Builder()
            .setLanguage("ja")
            .setRegion("JP")
            .setUnicodeLocaleKeyword("ca", "Japanese")
            .build();

4.4 Factory method fromLanguageTag()

You can obtain Locale instance corresponding to a BCP 47-compliant language tag string by using the fromLanguageTag factory method.

DemoLocale.java

        Locale l1 = Locale.forLanguageTag("ja-JP-u-ca-Japanese");
        Locale l2 = Locale.forLanguageTag("sr-Cyrl-ME");

4.5 Constants

The Locale class provides manifest constants corresponding to ready-made Locale instances for a few chosen languages and regions:

DemoLocale.java

        System.out.println("Locale.ENGLISH: " + Locale.ENGLISH.toLanguageTag());
        System.out.println("Locale.US: " + Locale.US.toLanguageTag());
        System.out.println("Locale.UK: " + Locale.UK.toLanguageTag());

Output

Locale.ENGLISH: en
Locale.US: en-US
Locale.UK: en-GB

You can find a complete list of available Locale constants in the Javadoc for Locale.

5. Locale methods

Once you have a Locale instance, you can query it for the values of its component fields, as well as other interesting information.

5.1 Accessors and Queries

The four language-defining fields of a Locale can be accessed with the methods getLanguage(), getScript(), getCountry(), and getVariant(). The empty string ("") is returned for missing fields.

DemoLocale.java

        Locale l = Locale.forLanguageTag("sr-Cyrl-ME");
        System.out.println("Locale: " + l.toLanguageTag());
        System.out.println("Language: \"" + l.getLanguage() + "\"");
        System.out.println("Script: \"" + l.getScript() + "\"");
        System.out.println("Country/region: \"" + l.getCountry() + "\"");
        System.out.println("Variant: \"" + l.getVariant() + "\"");

Output

Locale: sr-Cyrl-ME
Language: "sr"
Script: "Cyrl"
Country/region: "ME"
Variant: ""

5.2 Extension accessors

boolean hasExtensions()
Set getExtensionKeys
String getExtension(char key)

You can access the data of any BCP 47 extensions present in a Locale using the methods boolean hasExtensions(), Set<Character> getExtensionKeys(), and String getExtension(char):

DemoLocale.java

        Locale l = Locale.forLanguageTag(
            "ja-JP-u-ca-japanese-x-lvariant-JP");
        for (char c : l.getExtensionKeys()) {
            String ext = l.getExtension(c);
            System.out.printf("%c - %s%n", c, ext);
        }

Output

u - ca-japanese

5.3 Unicode Locale extension access

The methods Set<String> getUnicodeAttributes(), Set<String> getUnicodeLocaleKeys() and getUnicodeLocaleType(String) give you direct access to Unicode Locale extension data.

DemoLocale.java

        Locale l = Locale.forLanguageTag("en-US-u-attr1-attr2-ca-japanese-nu-thai");
        System.out.println("Unicode Locale attributes: "
            + String.join(",", l.getUnicodeLocaleAttributes()));
        for (String key : l.getUnicodeLocaleKeys()) {
            String type = l.getUnicodeLocaleType(key);
            System.out.println("Unicode Locale keyword: key=" + key + ", type="
                + type);
        }

Output

Unicode Locale attributes: attr1,attr2
Unicode Locale keyword: key=ca, type=japanese
Unicode Locale keyword: key=nu, type=thai

5.4 User-friendly names for Locale components

The getDisplayLanguage(), getDisplayScript(), getDisplayCountry() and getDisplayVariant() methods return user-friendly names for the corresponding Locale fields, localized (if possible) to the current default DISPLAY locale. getDisplayName() constructs a displayable name for the complete locale. Each of these methods also has a corresponding overloaded version that accepts a Locale instance and returns a name localized (if possible) for the specified locale.

DemoLocale.java

        Locale usLocale = Locale.forLanguageTag("en-US");
        System.out.printf("Language = %s (%s)%n", usLocale.getLanguage(), usLocale.getDisplayLanguage());
        System.out.printf("Region = %s (%s)%n", usLocale.getCountry(),
            usLocale.getDisplayCountry());
        System.out.printf("Language = %s (%s)%n", usLocale.getLanguage(), usLocale.getDisplayLanguage(Locale.FRENCH));
        System.out.printf("Region = %s (%s)%n", usLocale.getCountry(), usLocale.getDisplayCountry(Locale.FRENCH));

Output

Language = en (English)
Region = US (United States)
Language = en (anglais)
Region = US (États-Unis)

5.5 Other handy methods

5.5.1 Getting the available Locales

The static method Locale[] getAvailableLocales() returns a list of all the Locales for which support has been installed.

DemoLocale.java

        Locale[] allLocales = Locale.getAvailableLocales();
        for (Locale l : allLocales) {
            System.out.println(l.toLanguageTag() + ": " + l.getDisplayName());
        }

Output

und: 
nds: Low German
ti-ET: Tigrinya (Ethiopia)
ta-SG: Tamil (Singapore)
lv: Latvian
en-NU: English (Niue)
zh-Hans-SG: Chinese (Simplified, Singapore)
en-JM: English (Jamaica)
 ...

5.5.2 Returning a Language Tag for a Locale

Use the toLanguageTag() method to return the BCP 47 language tag for a Locale:

DemoLocales.java

        Locale serbian = new Locale.Builder()
            .setLanguage("sr")
            .setScript("Cyrl")
            .setRegion("ME")
            .build();
        System.out.println(serbian.toLanguageTag());

Output

sr-Cyrl-ME

6. Additional Reading

7. Download the source code

Download
You can download the full source code of this example here:
Internationalization in Java

Kevin Anderson

Kevin has been tinkering with computers for longer than he cares to remember.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Inline Feedbacks
View all comments
Back to top button