Core Java

Java Parse String Example

In this example, we shall learn about the different techniques to parse a String in Java. We will be demonstrating the StringTokenizer class, the String.split() method, and live code examples to leverage these utilities.

1. Introduction

The definition of “parse java” in Wiktionary is:

“To split a file or other input into pieces of data that can be easily stored or manipulated.”

The JDK provides different mechanisms to parse a string. They are :

  • The split method provided by the String class
  • The java.util.StringTokenizer class

We will see both of these approaches in detail in this example.

2. The StringTokenizer

The StringTokenizer is a legacy class since JDK 1.0 and allows an application to break a string into multiple tokens. A set of characters that separate the tokens is known as delimiters. The “delimiters” is of the type String and can be provided at the StringTokenizer creation time or on a per-token basis.

2.1. StringTokenizer constructors and methods

Let’s see with a few examples, what different constructors does a StringTokenizer offer?

2.1.1. public StringTokenizer(String str)

The StringTokenizer constructed by the above constructor uses the default delimiter set [” \t\n\r\f] namely the space, tab, newline, carriage-return, and the form-feed character. If you wish to separate out sub-strings from a string using any one of the above delimiter types, use this constructor.

Note: These delimiter characters are not treated as tokens by the tokenizer.

2.1.2. public StringTokenizer(String str, String delim)

The StringTokenizer constructed by the above constructor uses the characters in the delim param as a delimiter for separating tokens. For example, if you wish to separate out words in a string, using the colon “:” as a separator/delimiter, this constructor will help you achieve it. Just provide the delim as “:”.

Note: Like in the previous constructor, these characters are also not treated as tokens by the tokenizer.

The hasMoreTokens() method of the StringTokenizer tests for a token availability from the input string and returns a boolean. A return value of true guarantees the following call to the method nextToken() to successfully return a token. The below example demonstrates the above two constructors.

Example 1 – StringTokenizer

	public static void testNoDelimiters() {
		String sourceString = "This is a\nsample of\nnew big line-with\ttabs and\rcarriage-returns";
		System.out.println("Source String is " + sourceString);
		// uses default set of characters as delimiters
		StringTokenizer st = new StringTokenizer(sourceString);
		while (st.hasMoreTokens()) {
			System.out.println("testNoDelimiters : Next-Token = " + st.nextToken());
		}
		System.out.println(" ------------------------------------------------------------------------------ ");
		// uses space character as a delimiter; this
		// will disregard the default delimiter character set
		st = new StringTokenizer(sourceString, " ");
		while (st.hasMoreTokens()) {
			System.out.println("testSpaceDelimiter : Next-Token = " + st.nextToken());
		}
		System.out.println(" ------------------------------------------------------------------------------ ");
	}

2.1.3. public String nextToken​(String delim)

This is a special method that on invocation first changes the character set considered to be delimiters by the StringTokenizer to the new delimiter that is provided as an argument (delim). Then, it returns the next token in the string after the current position.

Example 2 – StringTokenizer

	public static void testNextTokenWithDelim() {
		String sourceString = "This-String_Example-has space , hyphen-and_hyphen-and_underscores";
		StringTokenizer st = new StringTokenizer(sourceString);
		System.out.println("Source String is " + sourceString);
		if (st.hasMoreTokens()) {
			// nextToken with delimiter _
			System.out.println("testNextTokenWithDelim | Delimiter _ : Next-Token = " + st.nextToken("_"));
		}
		while (st.hasMoreTokens()) {
			// nextToken with delimiter -
			System.out.println("testNextTokenWithDelim | Delimiter - : Next-Token = " + st.nextToken("-"));
		}
		System.out.println(" ------------------------------------------------------------------------------ ");
	}

2.1.4. public StringTokenizer(String str, String delim, boolean returnDelims)

This constructor takes three arguments, the input string, the delimiter, and a boolean. Each character in the delim argument does the job of separating tokens. The third argument to this constructor can have a value of true or false.

If true, then the delimiter characters are also returned as tokens on invoking the nextToken() method. A false value causes the delimiter characters to be skipped and serve only as separators between tokens.

This constructor should be used when you want to separate out words connected by a delimiter and also optionally return the delimiter. The below example demonstrates this.

Example 3 – StringTokenizer

	public static void testDelimiterColon() {
		String sourceString = "Computer Science:Programming:Java:String Tokenizer:Example";
		StringTokenizer st = new StringTokenizer(sourceString, ":");
		System.out.println("Source String is " + sourceString + "| Delimiter is : ");
		System.out.println("testCountTokens : " + st.countTokens());
		while (st.hasMoreTokens()) {
			System.out.println("testDelimiterColon : Next-Token = " + st.nextToken());
		}
		System.out.println(" ------------------------------------------------------------------------------ ");
		st = new StringTokenizer(sourceString, ":", true);
		System.out.println("testReturnDelimiters : Count-Tokens " + st.countTokens());
		while (st.hasMoreTokens()) {
			System.out.println("testReturnDelimiters : Next-Token = " + st.nextToken());
		}
		System.out.println(" ------------------------------------------------------------------------------ ");
	}

The countTokens() method whenever invoked, returns the number of tokens remaining at that moment in the source string using the current delimiter set.

Note: The delimiter can be more than a single character. The below example demonstrates a delimiter string “_Java”.

Example 4 – StringTokenizer

	public static void testLongStringDelimiter() {
		String sourceString = "Anmol_Deep_Java_Code_Geeks_Java_Author";
		System.out.println("Source String is " + sourceString +  " | Delimiter is _Java");
		StringTokenizer st = new StringTokenizer(sourceString, "_Java");
		System.out.println("testLongStringDelimiter : Count-Tokens " + st.countTokens());
		while (st.hasMoreTokens()) {
			System.out.println("testLongStringDelimiter : Next-Token = " + st.nextToken());
		}
		System.out.println(" ------------------------------------------------------------------------------ ");
	}

To see the complete code in action, click the Play button below.

import java.util.StringTokenizer;

public class StringTokenizerSample {

	public static void testNoDelimiters() {
		String sourceString = "This is a\nsample of\nnew big line-with\ttabs and\rcarriage-returns";
		System.out.println("Source String is " + sourceString);
		// uses default set of characters as delimiters
		StringTokenizer st = new StringTokenizer(sourceString);
		while (st.hasMoreTokens()) {
			System.out.println("testNoDelimiters : Next-Token = " + st.nextToken());
		}
		System.out.println(" ------------------------------------------------------------------------------ ");
		// uses space character as a delimiter; this
		// will disregard the default delimiter character set
		st = new StringTokenizer(sourceString, " ");
		while (st.hasMoreTokens()) {
			System.out.println("testSpaceDelimiter : Next-Token = " + st.nextToken());
		}
		System.out.println(" ------------------------------------------------------------------------------ ");
	}

	public static void testDelimiterColon() {
		String sourceString = "Computer Science:Programming:Java:String Tokenizer:Example";
		StringTokenizer st = new StringTokenizer(sourceString, ":");
		System.out.println("Source String is " + sourceString + " | Delimiter is : ");
		System.out.println("testCountTokens : " + st.countTokens());
		while (st.hasMoreTokens()) {
			System.out.println("testDelimiterColon : Next-Token = " + st.nextToken());
		}
		System.out.println(" ------------------------------------------------------------------------------ ");
		st = new StringTokenizer(sourceString, ":", true);
		System.out.println("testReturnDelimiters : Count-Tokens " + st.countTokens());
		while (st.hasMoreTokens()) {
			System.out.println("testReturnDelimiters : Next-Token = " + st.nextToken());
		}
		System.out.println(" ------------------------------------------------------------------------------ ");
	}

	public static void testNextTokenWithDelim() {
		String sourceString = "This-String_Example-has space , hyphen-and_hyphen-and_underscores";
		StringTokenizer st = new StringTokenizer(sourceString);
		System.out.println("Source String is " + sourceString);
		if (st.hasMoreTokens()) {
			// nextToken with delimiter _
			System.out.println("testNextTokenWithDelim | Delimiter _ : Next-Token = " + st.nextToken("_"));
		}
		while (st.hasMoreTokens()) {
			// nextToken with delimiter -
			System.out.println("testNextTokenWithDelim | Delimiter - : Next-Token = " + st.nextToken("-"));
		}
		System.out.println(" ------------------------------------------------------------------------------ ");
	}

	public static void testLongStringDelimiter() {
		String sourceString = "Anmol_Deep_Java_Code_Geeks_Java_Author";
		System.out.println("Source String is " + sourceString +  " | Delimiter is _Java");
		StringTokenizer st = new StringTokenizer(sourceString, "_Java");
		System.out.println("testLongStringDelimiter : Count-Tokens " + st.countTokens());
		while (st.hasMoreTokens()) {
			System.out.println("testLongStringDelimiter : Next-Token = " + st.nextToken());
		}
		System.out.println(" ------------------------------------------------------------------------------ ");
	}

	public static void main(String[] args) {
		testNoDelimiters();
		testDelimiterColon();
		testNextTokenWithDelim();
		testLongStringDelimiter();
	}
}

3. The String :: split method

The split method of the String class was introduced in JDK 1.4 and it works by splitting the source string keeping the original string unmodified and returns an array of substrings of the original string.

The split method takes a regular expression of type string as an argument and splits the source string around the matches of the regular expression. If the regular expression fails to match any part of the input string then an array with a single element is returned i.e. the whole string.

For more details on Regular Expressions, refer to this example. The String class provides two flavors of the split method. Both these methods throw a PatternSyntaxException for an invalid input regular expression. Let’s discuss each with an example.

3.1. public String[] split(String regex, int limit)

The above method, in addition to the regex, takes an integer argument “limit”. The limit impacts the length of the resulting array by limiting the number of times the regex pattern is applied to the source string.

A value of the limit, (let’s call it k).

  • k > 0 will apply the pattern at most k-1 times. This means the returned array’s length cannot be greater than k. The string at the end of the array contains all of the input string after the last match of the delimiter.
  • k < 0 will apply the pattern as many times as possible and the returned array can be of any length.
  • k = 0 will also apply the pattern as many times as possible and the returned array can be of any length. However, in this case the trailing empty strings will be discarded.

Example 1 – String.split

	public static void splitWithLimit() {
		String source = "705-103-102-456-123 : 112 _343-1 789----";
		System.out.println("Source String is " + source + " | Regex is -");
		// split with - and return 3 entries in array
		for (String x : source.split("-", 4)) {
			System.out.println("splitWithLimit (limit = 4) : Split Item -> " + x);
		}
		System.out.println(" ---------------------------------------------------------------- ");
		// split with - and return all splits including trailing empty strings
		for (String x : source.split("-", -3)) {
			System.out.println("splitWithLimit (limit = -3): Split Item -> " + x);
		}
		System.out.println(" ---------------------------------------------------------------- ");
		// split with - and return all splits discard the trailing empty strings
		for (String x : source.split("-", 0)) {
			System.out.println("splitWithLimit (limit = 0): Split Item -> " + x);
		}
		System.out.println(" ---------------------------------------------------------------- ");
	}

The above example demonstrates a split on a “-” delimiter with a limit of positive, negative and 0.

3.2. public String[] split(String regex)

Invoking this method will have the same effect as invoking the other split method with two arguments where the limit is zero.

Example 2 – String.split

	public static void splitPolynomial() {
		String expr = "10*a^2 + 1/3*b - c^2";
		System.out.println("Source String is " + expr);
		// delimiters are: + * / ^ space -
		String regex = "[+*/^ \\-]+";
		System.out.println("Splitting with Regex - " + regex);
		for (String str : expr.split(regex)) {
			System.out.println("splitPolynomial - " + str);
		}
	}

The above example uses a regular expression involving arithmetic operators(+[sum] , -[difference] , * [multiplication] , / [division] and ^[exponentiation]) as delimiters. The regex used is defined in [] and escape sequences are used to match a “-“, a special character in regular expressions. As a result, the operands will be parsed out as a result of the split.

Example 3 – String.split

	public static void splitEmail() {
		String sourceString = "poohpool@signet.co,swd@websource.co.in, jobs@websource.co.in, info@rupizxpress.com, mla@mla-india.com";
		System.out.println(" Source String is " + sourceString);
		for (String email : sourceString.split(",")) {
			for (String details : email.split("@")) {
				System.out.println("Details are  " + details);
			}
			System.out.println(" --------- NEXT - RECORD -------- ");
		}
	}

Another example of the split keyword can be seen above to separate out the domain names from a list of email addresses. The example shows multiple splits first using a “,” to separate out the email addresses and the second split using an “@” to separate out the identifier and the domain name of the email address.

To see the complete code in action, click the Play button below.

public class SplitExample {

	public static void splitWithLimit() {
		String source = "705-103-102-456-123 : 112 _343-1 789----";
		System.out.println("Source String is " + source + " | Regex is - ");
		// split with - and return 3 entries in array
		for (String x : source.split("-", 4)) {
			System.out.println("splitWithLimit (limit = 4) : Split Item -> " + x);
		}
		System.out.println(" ---------------------------------------------------------------- ");
		// split with - and return all splits including trailing empty strings
		for (String x : source.split("-", -3)) {
			System.out.println("splitWithLimit (limit = -3): Split Item -> " + x);
		}
		System.out.println(" ---------------------------------------------------------------- ");
		// split with - and return all splits discard the trailing empty strings
		for (String x : source.split("-", 0)) {
			System.out.println("splitWithLimit (limit = 0): Split Item -> " + x);
		}
		System.out.println(" ---------------------------------------------------------------- ");
	}

	public static void splitPolynomial() {
		String expr = "10*a^2 + 1/3*b - c^2";
		System.out.println("Source String is " + expr);
		// delimiters are: + * / ^ space -
		String regex = "[+*/^ \\-]+";
		System.out.println("Splitting with Regex - " + regex);
		for (String str : expr.split(regex)) {
			System.out.println("splitPolynomial - " + str);
		}
	}

	public static void splitEmail() {
		String sourceString = "poohpool@signet.co,swd@websource.co.in, jobs@websource.co.in, info@rupizxpress.com, mla@mla-india.com";
		System.out.println("Source String is " + sourceString);
		for (String email : sourceString.split(",")) {
			for (String details : email.split("@")) {
				System.out.println("Details are  " + details);
			}
			System.out.println(" --------- NEXT - RECORD -------- ");
		}
	}

	public static void main(String[] args) {
		splitWithLimit();
		splitPolynomial();
		splitEmail();
	}
}

4. StringTokenizer v/s Split

The following details should help you to decide which one to use for parsing a string in Java.

  • The Oracle docs of StringTokenizer class mention it to be a legacy class that is retained for compatibility reasons and its use is discouraged in new code. “It is recommended that anyone seeking this functionality use the split method of the String or the java.util.regex package instead.”
  • The split method supports a regular expression match based split unlike the StringTokenizer class and gives more power to split a string.
  • The StringTokenizer returns one token at a time by invoking the nextToken method. Care has to be taken by invoking the hasMoreTokens method before a call to the nextToken method. If there are no more tokens in this tokenizer’s string, the method nextToken throws a NoSuchElementException.
  • On the other hand, the split method returns a String[] which is easier to program with.

5. Java Parse String – Summary

In this tutorial, we learned about the various methods to parse a string in Java with examples and we saw which approach to prefer while programming in Java.

6. Download the source code

Download
You can download the full source code of this example here: Java Parse String Example

Last updated on Oct 12th, 2021

Anmol Deep

Anmol Deep is a senior engineer currently working with a leading identity security company as a Web Developer. He has 8 years of programming experience in Java and related technologies (including functional programming and lambdas) , Python, SpringBoot, Restful architectures, shell scripts, and databases relational(MySQL, H2) and nosql solutions (OrientDB and MongoDB). He is passionate about researching all aspects of software development including technology, design patterns, automation, best practices, methodologies and tools, and love traveling and photography when not coding.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Inline Feedbacks
View all comments
Back to top button