Core Java

Java Regex – Regular Expressions Tutorial

In this post, we will explain the Java Regex (Regular Expressions) through examples.

You can also check this tutorial in the following video:

Java Regular Expression (Regex) Tutorial – video

1. What is regular expression?

The Regex is a sequence of characters that can be used to search, edit or manipulate text and data in Java. You must learn a specific syntax to create regular expressions. A regular expression is used to define a constraint on strings such as password validation and email validation.

Java provides Java Regex API in java.util.regex package that contains the three classes: Pattern, Matcher, and PatternSyntaxException.

Java regex

1.1. What is Pattern?

The pattern is a compiled representation of a regular expression. A regular expression that is specified as a string must be compiled into an instance of Pattern class. The created pattern can be used to create a Matcher object.

1
Pattern p = Pattern.compile("\\d");

Instances of Pattern class are immutable and are thread-safe.

1.2. What is a Matcher?

A matcher is created from a pattern by invoking the pattern’s matcher method.

1
Matcher matcher = pattern.matcher("Regular expression tutorial with 9 examples!");

Instances of the Matcher class are not thread safe.

1.3. PatternSyntaxException

An unchecked exception is thrown when a regular expression syntax is incorrect.

1.4. Regular Expression Predefined Characters

Predefined Character work as shortcodes and make the code easier to read. Predefined Characters are also called Metacharacters.

Regular ExpressionDescription
\dAny digits, short of [0-9]
\DAny non-digit, short for [^0-9]
\sAny whitespace character, short for [\t\n\x0B\f\r]
\SAny non-whitespace character, short for [^\s]
\wAny word character, short for [a-zA-Z_0-9]
\WAny non-word character, short for [^\w]
\bA word boundary
\BA non word boundary

1.5. Regular Expression Quantifiers

The quantifiers specify the number of occurrences of a character in input string.

Regular ExpressionDescription
a?a occurs once or not at all
a*a occurs zero or more times
a+a occurs one or more times
a{n}a occurs exactly n times
a{n,}a occurs n or more times
a{n,m}a occurs at least n times but not more than m times

1.6. Regular Expression common symbols

Regular ExpressionDescription
.Any character
^The beginning of a line
$The end of a line
[abc]simple a, b, or c
[^abc]Any character except a, b, or c
(a)a, as a capturing group
\\The backslash character
a|bEither a or b
\tThe tab character
\nThe newline character
\rThe carriage-return character

2. How to use Java Regex

Let’s start with some examples with Pattern class and how it works.

2.1. split

1
2
3
4
5
Pattern pattern = Pattern.compile("\\d+");
String[] st = pattern.split("20 potato, 10 tomato, 5 bread");
for(int i = 1; i < st.length; i++) {
   System.out.println("recipe ingredient" + i + " : " + st[i]);
}

Output

1
2
3
recipe ingredient1 : potato,
recipe ingredient2 : tomato,
recipe ingredient3 : bread

split() splits the given input string based on matches of the pattern. In the above example, the split method will look for any digit number which occurs once or more in the input string.

2.2. flags

A Pattern can be created with flags to make the pattern flexible against the input string. For example Pattern.CASE_INSENSITIVE enables case insensitive matching.

1
Pattern pattern = Pattern.compile("abc$", Pattern.CASE_INSENSITIVE);

2.3. matches

Pattern class has a matches method that takes regular expression and input string as argument and returns a boolean result after matching them.

1
System.out.println("Matches: " + pattern.matches(".*", "abcd654xyz00")); // true

If the input string is matched with the pattern, you can use the String matches method instead of using Pattern and matches.

1
2
String str = "abcd654xyz00";
str.matches(".*"); //true
Tip
A pattern is applied on a string from left to right and each part of the string that is used in the match, can not be reused. For example, regex “234″ will match “34234656723446″ only twice as “__234____234__″.

2.4. Groups and capturing

Capturing groups are numbered by counting their opening parentheses from left to right. In the expression ((A)(B(C))), for example, there are four such groups: ((A)(B(C))), (A), (B(C)), (C).

To find out how many groups are present in the regular expression, you can use groupCount on a matcher object. The groupCount() method returns an int showing the number of capturing groups present in the matcher’s pattern. For example in ((ab)(c)) contains 3 capturing groups; ((ab)(c)), (ab) and (c).

There is also a special group, group zero, which always represents the entire expression. This group is not included in the total reported by groupCount().

01
02
03
04
05
06
07
08
09
10
Pattern p = Pattern.compile("(cd)(\\d+\\w)(.*)", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher("abCD45ee EE54dcBA");
if(m.find()) {
    System.out.println("Group0: " + m.group(0));
    System.out.println("Group1: " + m.group(1));
    System.out.println("Group2: " + m.group(2));
    System.out.println("Group3: " + m.group(3));
}
 
System.out.println("Group count: " + m.groupCount());

And here is the output:

1
2
3
4
5
Group0: CD45ee EE54dcBA
Group1: CD
Group2: 45e
Group3: e EE54dcBA
Group count: 3

The part of input String that matches the capturing group is saved into memory and can be recalled using Backreference. Backreference can be used in regular expression with backslash (\) and then the number of group to be recalled.

1
2
3
4
System.out.println(Pattern.matches("(\\d\\w)\\1", "2x2x")); //true
System.out.println(Pattern.matches("(\\d\\w)\\1", "2x2z")); //false
System.out.println(Pattern.matches("(A\\d)(bcd)\\2\\1", "A4bcdbcdA4")); //true
System.out.println(Pattern.matches("(A\\d)(bcd)\\2\\1", "A4bcdbcdA5")); // false

In the first example, the capturing group is (\d\w). The capturing group results is to “2x” when it is matched with the input String “2x2x” and saved in memory. The backreference \1 is referring to “a2” and it returns true. However, due to the same analyses the second example will result in false. Now, it is your turn to analyze the capturing group for examples 3 and 4.

2.5. Other Matcher methods

Matcher has some other methods to work with regular expressions.

2.5.1 lookingAt and matches

The matches and lookingAt methods both will match an input string against a pattern. However, the difference between them is that matches requires the entire input string to be matched, while lookingAt does not.

1
2
3
4
Pattern pattern = Pattern.compile("dd");
Matcher matcher = ptr.matcher("dd3435dd");
System.out.println("lookingAt(): " + matcher.lookingAt()); // true
System.out.println("matches(): " + matcher.matches()); // false

2.5.2. start and end

start() and end() methods represent where the match was found in the input string.

1
2
3
4
5
6
Pattern p = Pattern.compile("(cd)(\\d+\\w)(.*)", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher("abCD45ee EE54dcBA");
if(m.find()) {
    System.out.println("start(): " + m.start()); //2
    System.out.println("end(): " + m.end()); //17
}

2.5.3. replaceAll and replaceFirst

replaceAll and replaceFirst are manipulating the input string with the replacement string. replaceFirst replaces the first occurrence, and replaceAll replaces all occurrences.

1
2
3
4
5
6
7
8
public static void  main(String[] args){
   Pattern pt = Pattern.compile("Lion");
   Matcher mt = pt.matcher("Lion is the strongest animal in jungle. Lion is smart.");
   String s1 = mt.replaceFirst("Bear");
   System.out.println("replaceFirst(): " + s1);
   String s2 = mt.replaceAll("Tiger");
   System.out.println("replaceAll()" + s2);
}

Output

1
2
replaceFirst(): Bear is the strongest animal in jungle. Lion is smart.
replaceAll()Tiger is the strongest animal in jungle. Tiger is smart.

Java regex is always important in interview questions and needs more practice.

3. Download the source code

This was a tutorial for java regular expression.

Download
You can download the full source code of this example here: Java Regex – Regular Expressions Tutorial

Last updated on Nov. 10th, 2021

Ima Miri

Ima is a Senior Software Developer in enterprise application design and development. She is experienced in high traffic websites for e-commerce, media and financial services. She is interested in new technologies and innovation area along with technical writing. Her main focus is on web architecture, web technologies, java/j2ee, Open source and mobile development for android.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Inline Feedbacks
View all comments
Back to top button