Java Regex – Regular Expressions Tutorial
In this post, we will explain the Java Regex (Regular Expressions) through examples.
You can also check this tutorial in the following video:
1. What is regular expression?
The Regex is a sequence of characters that can be used to search, edit or manipulate text and data in Java. You must learn a specific syntax to create regular expressions. A regular expression is used to define a constraint on strings such as password validation and email validation.
Java provides Java Regex API in java.util.regex package that contains the three classes: Pattern, Matcher, and PatternSyntaxException.
1.1. What is Pattern?
The pattern is a compiled representation of a regular expression. A regular expression that is specified as a string must be compiled into an instance of Pattern
class. The created pattern can be used to create a Matcher object.
1 | Pattern p = Pattern.compile( "\\d" ); |
Instances of Pattern class are immutable and are thread-safe.
1.2. What is a Matcher?
A matcher is created from a pattern by invoking the pattern’s matcher method.
1 | Matcher matcher = pattern.matcher( "Regular expression tutorial with 9 examples!" ); |
Instances of the Matcher
class are not thread safe.
1.3. PatternSyntaxException
An unchecked exception is thrown when a regular expression syntax is incorrect.
1.4. Regular Expression Predefined Characters
Predefined Character work as shortcodes and make the code easier to read. Predefined Characters are also called Metacharacters.
Regular Expression | Description |
\d | Any digits, short of [0-9] |
\D | Any non-digit, short for [^0-9] |
\s | Any whitespace character, short for [\t\n\x0B\f\r] |
\S | Any non-whitespace character, short for [^\s] |
\w | Any word character, short for [a-zA-Z_0-9] |
\W | Any non-word character, short for [^\w] |
\b | A word boundary |
\B | A non word boundary |
1.5. Regular Expression Quantifiers
The quantifiers specify the number of occurrences of a character in input string.
Regular Expression | Description |
a? | a occurs once or not at all |
a* | a occurs zero or more times |
a+ | a occurs one or more times |
a{n} | a occurs exactly n times |
a{n,} | a occurs n or more times |
a{n,m} | a occurs at least n times but not more than m times |
1.6. Regular Expression common symbols
Regular Expression | Description |
. | Any character |
^ | The beginning of a line |
$ | The end of a line |
[abc] | simple a, b, or c |
[^abc] | Any character except a, b, or c |
(a) | a, as a capturing group |
\\ | The backslash character |
a|b | Either a or b |
\t | The tab character |
\n | The newline character |
\r | The carriage-return character |
2. How to use Java Regex
Let’s start with some examples with Pattern class and how it works.
2.1. split
1 2 3 4 5 | Pattern pattern = Pattern.compile( "\\d+" ); String[] st = pattern.split( "20 potato, 10 tomato, 5 bread" ); for ( int i = 1 ; i < st.length; i++) { System.out.println( "recipe ingredient" + i + " : " + st[i]); } |
Output
1 2 3 | recipe ingredient1 : potato, recipe ingredient2 : tomato, recipe ingredient3 : bread |
split()
splits the given input string based on matches of the pattern. In the above example, the split method will look for any digit number which occurs once or more in the input string.
2.2. flags
A Pattern can be created with flags to make the pattern flexible against the input string. For example Pattern.CASE_INSENSITIVE
enables case insensitive matching.
1 | Pattern pattern = Pattern.compile( "abc$" , Pattern.CASE_INSENSITIVE); |
2.3. matches
Pattern class has a matches method that takes regular expression and input string as argument and returns a boolean result after matching them.
1 | System.out.println( "Matches: " + pattern.matches( ".*" , "abcd654xyz00" )); // true |
If the input string is matched with the pattern, you can use the String matches method instead of using Pattern and matches.
1 2 | String str = "abcd654xyz00" ; str.matches( ".*" ); //true |
A pattern is applied on a string from left to right and each part of the string that is used in the match, can not be reused. For example, regex “234″ will match “34234656723446″ only twice as “__234____234__″.
2.4. Groups and capturing
Capturing groups are numbered by counting their opening parentheses from left to right. In the expression ((A)(B(C)))
, for example, there are four such groups: ((A)(B(C))), (A), (B(C)), (C)
.
To find out how many groups are present in the regular expression, you can use groupCount on a matcher object. The groupCount()
method returns an int showing the number of capturing groups present in the matcher’s pattern. For example in ((ab)(c))
contains 3 capturing groups; ((ab)(c)), (ab) and (c)
.
There is also a special group, group zero, which always represents the entire expression. This group is not included in the total reported by groupCount()
.
01 02 03 04 05 06 07 08 09 10 | Pattern p = Pattern.compile( "(cd)(\\d+\\w)(.*)" , Pattern.CASE_INSENSITIVE); Matcher m = p.matcher( "abCD45ee EE54dcBA" ); if (m.find()) { System.out.println( "Group0: " + m.group( 0 )); System.out.println( "Group1: " + m.group( 1 )); System.out.println( "Group2: " + m.group( 2 )); System.out.println( "Group3: " + m.group( 3 )); } System.out.println( "Group count: " + m.groupCount()); |
And here is the output:
1 2 3 4 5 | Group0: CD45ee EE54dcBA Group1: CD Group2: 45e Group3: e EE54dcBA Group count: 3 |
The part of input String that matches the capturing group is saved into memory and can be recalled using Backreference. Backreference can be used in regular expression with backslash (\)
and then the number of group to be recalled.
1 2 3 4 | System.out.println(Pattern.matches( "(\\d\\w)\\1" , "2x2x" )); //true System.out.println(Pattern.matches( "(\\d\\w)\\1" , "2x2z" )); //false System.out.println(Pattern.matches( "(A\\d)(bcd)\\2\\1" , "A4bcdbcdA4" )); //true System.out.println(Pattern.matches( "(A\\d)(bcd)\\2\\1" , "A4bcdbcdA5" )); // false |
In the first example, the capturing group is (\d\w)
. The capturing group results is to “2x” when it is matched with the input String “2x2x” and saved in memory. The backreference \1 is referring to “a2” and it returns true. However, due to the same analyses the second example will result in false. Now, it is your turn to analyze the capturing group for examples 3 and 4.
2.5. Other Matcher methods
Matcher has some other methods to work with regular expressions.
2.5.1 lookingAt and matches
The matches and lookingAt methods both will match an input string against a pattern. However, the difference between them is that matches
requires the entire input string to be matched, while lookingAt
does not.
1 2 3 4 | Pattern pattern = Pattern.compile( "dd" ); Matcher matcher = ptr.matcher( "dd3435dd" ); System.out.println( "lookingAt(): " + matcher.lookingAt()); // true System.out.println( "matches(): " + matcher.matches()); // false |
2.5.2. start and end
start()
and end()
methods represent where the match was found in the input string.
1 2 3 4 5 6 | Pattern p = Pattern.compile( "(cd)(\\d+\\w)(.*)" , Pattern.CASE_INSENSITIVE); Matcher m = p.matcher( "abCD45ee EE54dcBA" ); if (m.find()) { System.out.println( "start(): " + m.start()); //2 System.out.println( "end(): " + m.end()); //17 } |
2.5.3. replaceAll and replaceFirst
replaceAll and replaceFirst are manipulating the input string with the replacement string. replaceFirst
replaces the first occurrence, and replaceAll
replaces all occurrences.
1 2 3 4 5 6 7 8 | public static void main(String[] args){ Pattern pt = Pattern.compile( "Lion" ); Matcher mt = pt.matcher( "Lion is the strongest animal in jungle. Lion is smart." ); String s1 = mt.replaceFirst( "Bear" ); System.out.println( "replaceFirst(): " + s1); String s2 = mt.replaceAll( "Tiger" ); System.out.println( "replaceAll()" + s2); } |
Output
1 2 | replaceFirst(): Bear is the strongest animal in jungle. Lion is smart. replaceAll()Tiger is the strongest animal in jungle. Tiger is smart. |
Java regex is always important in interview questions and needs more practice.
3. Download the source code
This was a tutorial for java regular expression.
You can download the full source code of this example here: Java Regex – Regular Expressions Tutorial
Last updated on Nov. 10th, 2021