Convert Unicode Encoding String to Letters

Dealing with strings encoded in Unicode is a common task in Java programming, especially in multilingual applications where text comes in various scripts and languages. Java provides mechanisms to handle Unicode-encoded strings efficiently. Sometimes, there might be scenarios where we might need to convert these Unicode-encoded strings into a more human-readable format, such as a string of letters. This article will explore how to achieve this conversion in Java.

1. Understanding Unicode Encoding

Unicode is a standard for encoding characters used in text processing across different platforms and languages. Each character in the Unicode standard is assigned a unique code point, typically represented in hexadecimal format. When dealing with strings in Java, they are inherently represented using Unicode encoding, which ensures compatibility and support for a wide range of characters.

2. Converting Unicode Encoded Strings to Letters

To convert a Unicode-encoded string to a string of letters in Java, we can utilize various methods and classes provided by the Java standard library. There are two main approaches to converting a Unicode string containing letters into a string of just letters in Java:

2.1 Using Regular Expressions

One approach is to utilize regular expressions to match and extract letters from the Unicode encoded string. Java’s Pattern and Matcher classes enable us to define a regular expression pattern that matches letters and extract them from the input string. Here’s an example:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class StringToLetters {

        public static String unicodeToString(String unicodeString) {

        Pattern pattern = Pattern.compile("\\\\u[0-9a-fA-F]{4}");
        Matcher matcher = pattern.matcher(unicodeString);
        StringBuilder builder = new StringBuilder();
        while (matcher.find()) {
            String unicodeSequence =;
            char unicode = (char) Integer.parseInt(unicodeSequence.substring(2), 16);
            matcher.appendReplacement(builder, Character.toString(unicode));
        return builder.toString();

    public static void main(String[] args) {

        String unicodeString = "\u0048\u0065\u006C\u006C\u006F \u0057\u006F\u0072\u006C\u0064"; // Unicode encoded string: "Hello World"
        String letters = unicodeToString(unicodeString);
        System.out.println(letters); // Output: "Hello World"


2.2 Using Java’s Character Class

Java’s Character class offers methods to work with individual characters, including those encoded in Unicode. Another way to convert a Unicode-encoded string to a string of letters is by iterating through each character in the string and checking if it represents a letter using the Character.isLetter() method.

public class StringToLetters {

        public static String unicodeToStrings(String unicodeString) {
        StringBuilder builder = new StringBuilder();
        for (int i = 0; i < unicodeString.length(); i++) {
            char c = unicodeString.charAt(i);
            if (Character.isLetter(c)) {
            } else if (Character.isWhitespace(c)) {
                builder.append(' ');
        return builder.toString();

    public static void main(String[] args) {
        String unicodeString = "\u0048\u0065\u006C\u006C\u006F \u0057\u006F\u0072\u006C\u0064"; // Unicode encoded string: "Hello World"
        String letters = unicodeToString(unicodeString);
        System.out.println(letters); // Output: "Hello World"


3. Using Apache Commons Text

The Apache Commons Text library provides a convenient utility class, StringEscapeUtils.unescapeJava() that can be used to convert the escaped Unicode characters to their corresponding characters, and then process the resulting string to extract letters. Here’s an example:



Java code:

import org.apache.commons.text.StringEscapeUtils;

public class UnicodeToStringConverter {

    public static String unicodeToString(String unicodeString) {
        return StringEscapeUtils.unescapeJava(unicodeString);

    public static void main(String[] args) {
        String unicodeString = "\u0048\u0065\u006C\u006C\u006F \u0057\u006F\u0072\u006C\u0064";
        String letters = unicodeToString(unicodeString);
        System.out.println(letters); // Output: "Hello World"


The output is:

Fig 1: Output from converting Unicode encoded string to letter in Java
Fig 1: Output from converting Unicode encoded string to letter in Java

4. Conclusion

Converting Unicode encoded strings to strings of letters in Java is a task that can be accomplished using various techniques, such as iterating through characters or employing regular expressions. By understanding these methods, we can effectively manipulate and process Unicode-encoded text in our Java applications.

