Internationalization in Java
In this article, we are going to explain Internationalization in Java.
1. What are Internationalization and Localization?
Internationalization, or I18N for short, is the broad term for the techniques and processes involved in building applications that are easily adaptable to different cultural norms and/or preferences. The process of actually adapting an application to a particular set of cultural norms is localization (or L10N).
2. Locales
Central to the Java model of internationalization is the idea of a locale, which Unicode Technical Standard #35 defines as “…an identifier (id) that refers to a set of user preferences that tend to be shared across significant swaths of the world.” The main user preference associated with a locale is, perhaps not surprisingly, language, but locales also encompass a variety of other often-shared preferences as well. Some of these, such as the formatting of numbers, dates, and times, tend to be tied more or less tightly to the language, Locales can even include additional preferences that are either completely extra-linguistic or not strictly language-linked, such as calendar usage, or numeral style.
3. The java.util.Locale class
In Java, locales are represented by instances of the java.util.Locale
class. From Java SE 7 on, the Locale
class implements the concept of a language tag as defined by the IETF standard BCP 47 A BCP 47 language tag is a string consisting of one or more subtags, each no more than eight characters long and separated by hyphens, in the following order:
- (Primary) Language: two or three letters, normally in lowercase; the broadest classification of a language; examples:
ar
(Arabic),zh
(Mandarin Chinese),en
(English); - Script: four letters, normally in Title Case; identifies a specific writing system; examples:
Cyrl
(Cyrillic script),Latn
(Latin script),Hans
(Simplified Chinese characters); - Region: a country (two letters, normally in UPPERCASE) or other geographic area (three digits); Examples:
CA
(Canada),JP
(Japan),419
(Latin America). For historical reasons, theLocale
class uses the term “country” when referring to the BCP 47 region subtag; - Variant: significant additional variants not adequately identified by some combination of primary language, script, and region; minimum 5 characters if it starts with a letter, four if it starts with a digit. Examples:
VALENCIA
(Valencian dialect of Catalan),1901
and1996
(dates of German spelling reforms) - Extensions: additional locale-related information which is cannot be captured by some combination of Language, Script, Region, or Variant subtags. Each extension is introduced by a one-character subtag (called a singleton) and consists of all the following subtags that are larger than one character. The extension type of most relevance to Java applications is the Unicode Locale Extension, identified by the singleton
'u'
. The Unicode Locale extension has the following subcomponents:- Unicode locale attributes: identifiers (3-8 characters) for boolean (true/false) locale properties;
- Unicode locale keywords: each is a key/type pair of subtags; the key (2 characters), names a multi-valent locale property and the type (3-8 characters) gives the property value. Example:
ca-japanese
; the key isca
(calendar), the type isjapanese
(Japanese Imperial calendar).
Here’s an example of a language tag:
ja-JP-u-ca-japanese
It breaks down like this:
Subtag | Value | Meaning |
---|---|---|
Language | ja | Japanese |
Script | (omitted) | Japn , Japanese Script (implied) |
Region | JP | Japan |
Variant | (omitted) | None required |
Extension singleton | u | Unicode Locale extension |
Unicode Keyword key | ca | Calendar usage |
Unicode Keyword type | japanese | Japanese Imperial calendar |
ja-JP-u-ca-japanese
In spite of the convention of writing language subtags in lowercase, Script subtags in Title Case, and Region subtags in UPPERCASE, Locale
components and language tags are always treated as case-insensitive. For example, CA
, Ca
, cA
, and ca
all represent either Catalan (when used as a Language subtag), or Canada (when used as a region subtag).
4. Obtaining Locales
An important part of the logic of an internationalized application involves simply obtaining appropriate Locale
instances passing them around to various locale-sensitive services that perform. You have a variety of options for obtaining the Locale
objects you need.
4.1 Default Locales
The JDK establishes one or more default locales, based on the configuration of the host environment in which your application is running. Some locale-sensitive methods implicitly use a default locale instead of an explicit Locale
argument, but you can also obtain references to default Locale
s and use them just as you would any other Locale
:
DemoLocale.java
Locale defaultLocale = Locale.getDefault(); Locale displayDefaultLocale = Locale.getDefault(Locale.Category.DISPLAY); Locale formatDefaultLocale = Locale.getDefault(Locale.Category.FORMAT);
Historically, JDK has always provided (and continues to provide) a single “anonymous” default Locale
. But with newer operating systems offering support for multiple locales to be configured for different uses, Java SE 7 added support for two named categories of default locale: DISPLAY
and FORMAT
. The default DISPLAY
locale typically applies to textual components of the application UI, while the default FORMAT
locale applies to the formatting of individual numbers, dates, and times. The categories are identified by member constants of the enum class Locale.Category
, so that new categories could easily be added in the future if needed (as of Java 17, however, no new categories have been added).
4.2 Constructors
Locale
s needing only language, country/region, or variant subtags can be constructed:
DemoLocale.java
Locale frenchLocale = new Locale("fr"); Locale brazilianPortuguese = new Locale("pt", "BR"); Locale valencianCatalan = new Locale("ca", "ES", "VALENCIA");
However, the Locale
constructors are holdovers from pre-BCP 47 days when locales only had language, country, and variant components. In order to create Locale
s that make use of all BCP 47 features, including Script subtags and extensions, you need to use either the Locale.Builder
API or the forLanguageTag()
factory method.
4.3 Locale.Builder
Locale.Builder
provides a fluent API that lets you build BCP 47 well-formed Locale
instances programmatically from their component subtags. The Builder
API lets you use all available Locale
features, including Script subtags and Unicode Locale Extension subtags. Here are some examples:
DemoLocale.java
// Serbian language (Montenegro), Cyrillic script Locale serbian = new Locale.Builder() .setLanguage("sr") .setScript("Cyrl") .setRegion("ME") .build(); // Japanese language (Japan), Imperial calendar Locale japaneseWithImperialCalendar = new Locale.Builder() .setLanguage("ja") .setRegion("JP") .setUnicodeLocaleKeyword("ca", "Japanese") .build();
4.4 Factory method fromLanguageTag()
You can obtain Locale
instance corresponding to a BCP 47-compliant language tag string by using the fromLanguageTag
factory method.
DemoLocale.java
Locale l1 = Locale.forLanguageTag("ja-JP-u-ca-Japanese"); Locale l2 = Locale.forLanguageTag("sr-Cyrl-ME");
4.5 Constants
The Locale
class provides manifest constants corresponding to ready-made Locale
instances for a few chosen languages and regions:
DemoLocale.java
System.out.println("Locale.ENGLISH: " + Locale.ENGLISH.toLanguageTag()); System.out.println("Locale.US: " + Locale.US.toLanguageTag()); System.out.println("Locale.UK: " + Locale.UK.toLanguageTag());
Output
Locale.ENGLISH: en Locale.US: en-US Locale.UK: en-GB
You can find a complete list of available Locale
constants in the Javadoc for Locale
.
5. Locale methods
Once you have a Locale
instance, you can query it for the values of its component fields, as well as other interesting information.
5.1 Accessors and Queries
The four language-defining fields of a Locale
can be accessed with the methods getLanguage()
, getScript()
, getCountry()
, and getVariant()
. The empty string (""
) is returned for missing fields.
DemoLocale.java
Locale l = Locale.forLanguageTag("sr-Cyrl-ME"); System.out.println("Locale: " + l.toLanguageTag()); System.out.println("Language: \"" + l.getLanguage() + "\""); System.out.println("Script: \"" + l.getScript() + "\""); System.out.println("Country/region: \"" + l.getCountry() + "\""); System.out.println("Variant: \"" + l.getVariant() + "\"");
Output
Locale: sr-Cyrl-ME Language: "sr" Script: "Cyrl" Country/region: "ME" Variant: ""
5.2 Extension accessors
boolean hasExtensions() SetgetExtensionKeys String getExtension(char key)
You can access the data of any BCP 47 extensions present in a Locale
using the methods boolean hasExtensions()
, Set<Character> getExtensionKeys()
, and String getExtension(char)
:
DemoLocale.java
Locale l = Locale.forLanguageTag( "ja-JP-u-ca-japanese-x-lvariant-JP"); for (char c : l.getExtensionKeys()) { String ext = l.getExtension(c); System.out.printf("%c - %s%n", c, ext); }
Output
u - ca-japanese
5.3 Unicode Locale extension access
The methods Set<String> getUnicodeAttributes()
, Set<String> getUnicodeLocaleKeys()
and getUnicodeLocaleType(String)
give you direct access to Unicode Locale extension data.
DemoLocale.java
Locale l = Locale.forLanguageTag("en-US-u-attr1-attr2-ca-japanese-nu-thai"); System.out.println("Unicode Locale attributes: " + String.join(",", l.getUnicodeLocaleAttributes())); for (String key : l.getUnicodeLocaleKeys()) { String type = l.getUnicodeLocaleType(key); System.out.println("Unicode Locale keyword: key=" + key + ", type=" + type); }
Output
Unicode Locale attributes: attr1,attr2 Unicode Locale keyword: key=ca, type=japanese Unicode Locale keyword: key=nu, type=thai
5.4 User-friendly names for Locale components
The getDisplayLanguage()
, getDisplayScript()
, getDisplayCountry()
and getDisplayVariant()
methods return user-friendly names for the corresponding Locale
fields, localized (if possible) to the current default DISPLAY
locale. getDisplayName()
constructs a displayable name for the complete locale. Each of these methods also has a corresponding overloaded version that accepts a Locale
instance and returns a name localized (if possible) for the specified locale.
DemoLocale.java
Locale usLocale = Locale.forLanguageTag("en-US"); System.out.printf("Language = %s (%s)%n", usLocale.getLanguage(), usLocale.getDisplayLanguage()); System.out.printf("Region = %s (%s)%n", usLocale.getCountry(), usLocale.getDisplayCountry()); System.out.printf("Language = %s (%s)%n", usLocale.getLanguage(), usLocale.getDisplayLanguage(Locale.FRENCH)); System.out.printf("Region = %s (%s)%n", usLocale.getCountry(), usLocale.getDisplayCountry(Locale.FRENCH));
Output
Language = en (English) Region = US (United States) Language = en (anglais) Region = US (États-Unis)
5.5 Other handy methods
5.5.1 Getting the available Locales
The static method Locale[] getAvailableLocales()
returns a list of all the Locale
s for which support has been installed.
DemoLocale.java
Locale[] allLocales = Locale.getAvailableLocales(); for (Locale l : allLocales) { System.out.println(l.toLanguageTag() + ": " + l.getDisplayName()); }
Output
und: nds: Low German ti-ET: Tigrinya (Ethiopia) ta-SG: Tamil (Singapore) lv: Latvian en-NU: English (Niue) zh-Hans-SG: Chinese (Simplified, Singapore) en-JM: English (Jamaica) ...
5.5.2 Returning a Language Tag for a Locale
Use the toLanguageTag()
method to return the BCP 47 language tag for a Locale
:
DemoLocales.java
Locale serbian = new Locale.Builder() .setLanguage("sr") .setScript("Cyrl") .setRegion("ME") .build(); System.out.println(serbian.toLanguageTag());
Output
sr-Cyrl-ME
6. Additional Reading
- BCP 47, which combines the two documents
- RFC 6067 – BCP 47 Extension U defines the general syntax of the Unicode Locale extension;
- UTS (Unicode Technical Standard) #35 specifies the valid attribute, key, and type values for use with Extension U