Python RegEx Tutorial
In this article, we will be looking at the REGEX (Regular Expression) module in Python.
1. What is REGEX?
REGEX stands for Regular expressions. It is a string of characters that specifies a pattern, and we used regex for locating, matching, and managing text. Regular expressions use regular language, and Stephen Cole Kleene first developed regular language in 1951. The Syntax for Regex is available here.
2. Regex module in Python
Python stores all its regex-related functionality using the “re” module. To use the re module, we need to import the “re” module. The “Re” module provides Perl-like regex support in Python and its complete documentation is available here.
To understand how to use the Regex module, let us see a couple of simple examples. We will use one of the functions present in the regex module to see how regex works. The first step is to import re
import re re.search('1111','foofighters1111')
Here 1111 is the regex pattern, and foofighters1111 is the input String.
3. Regex Functions
“re” module has many functions; the entire list is available in the documentation under Module contents. Here we will look at some very commonly used methods.
3.1 findAll()
FindAll returns all the non-overlapping matches in a string in a list. If it finds no matches, then it returns an empty list. To use the findAll we need the following parameters:
re.findall(pattern, string, flags=0)
3.2 search()
This function returns a Match object when it matches the regex pattern with the input String and if it finds a match and None when there are no matches. The Syntax of search is
re.search(pattern, string, flags=0)
3.3 split()
The split function splits the string based on the regex given and returns a List. Alternatively, it returns an empty list in case of no matches. Its syntax is:
re.split(pattern, string, maxsplit=0, flags=0)
Besides the pattern and the String, we can also mention the maximum number of splits we want. After it reaches the max splits, the rest of the text is returned as-is.
3.4 sub()
To replace the matched pattern with another string, we use the sub() function. Sub has the following arguments:
re.sub(pattern, repl, string, count=0, flags=0)
We can optionally also set the count I,e, number of occurrences we want replacing. If the count is not specified, sub replaces all the occurrences it finds starting with the leftmost.
3.5 match()
The match function checks if the pattern matches the beginning of the String or not. If it finds a match, then a Match object is returned. Else it returns None. The syntax is as follows:
re.match(pattern, string, flags=0)
4. Metacharacters
Before we get into seeing examples of the functions, we need to understand Metacharacters. Metacharacters are sort of like keywords for a regex pattern. These are special characters that have specific meanings and are used to build patterns/ regexes to search. The Metacharacters that Python uses are as follows:
Character | Description | Example |
---|---|---|
[] | It checks for a string of characters | “[m-p]” or “[A-D]” |
\ | This is used as an escape character for special characters | “\d” |
. | Any character except for the newline | “b…lding” |
^ | To signify starts with some patterns | “^c” |
* | To check if there are zero or more occurrences | “*bid” |
$ | To check if the String ends with the pattern | “goodbye$” |
+ | One or more occurrences | “aid+” |
{} | To specify the exact number of occurrences | “b{2}” |
| | This is to specify either-or | “up|down” |
() | Group and capture |
5. Special Sequences
Besides Metacharacters, we also use special sequences. We mention the Special Sequences after the \ metacharacter. The sequences available are as follows:
Sequence | Description |
---|---|
\A | The characters after the A are at the start of the string |
\b | The pattern after the \b are either at the beginning or at the end of the string |
\B | The pattern can be anywhere in the String except at the start or end of the string |
\d | The String contains digits, i.e., 0 to 9 |
\D | The string does not contain digits |
\s | String contains whitespaces |
\S | String Does not contain whitespaces |
\w | The string contains a to z or 0 to 9 or the underscore character |
\W | String does not contain any word characters |
\Z | The pattern is at the end of the string |
6. Sets
Besides the Special sequence, we also have Sets. We enclose Sets in [], and they are a set of characters with special meaning. The sets available are:
Set | Description |
---|---|
[bdf] | If any one of the specified characters b, d, or f is present in the input |
[a-n] | returns any characters between a and n from the input string. Only lowercase considered. |
[^are] | all other characters except the ones mentioned are returned. |
[0123] | returns the digits of they are from 0,1,2, and 3 |
[0-9] | returns any digit between 0 and 9 |
[0-7][0-9] | returns any numbers between 00 and 79. |
[a-zA-Z] | returns any alphabets. Both lowercase and uppercase are considered. |
[+] | returns any + signs found in the pattern. |
7. Examples
We will look at all the different parameters we looked at in the below examples. We have added all our examples in a single Python script called regex_examples.py
regex_examples.py
import re #findall method txt = "By the pricking of my thumbs, Something wicked my way comes. Open, locks, Whoever knocks!" #findall with sets #Only lowercase characters will be considered lowChar = re.findall("[p-t]", txt) print("findall with lowercase::", lowChar) print(" ") #To ignore case we can add a flag to ignore the case of the string. ignoreCaseChar = re.findall("[p-t]",txt,flags = re.I) print("findall with the Ignore case flag:: ",ignoreCaseChar) print("\n") #search searchString = re.search("my", txt) print("Search output: ", searchString) print("\n") #Split splitString = re.split("\s", txt) print("Split String on whitespaces output: ", splitString) print("\n") #Split with maxnumber. The rest of the string is returned as-is splitMaxNum = re.split("\s", txt, 3) print("Split String with whitespace and max number output: ", splitMaxNum) print("\n") inputString = "2004-959-559 # Thorin Oakenshield # The King Under the Mountain" #Substitue characters #The r at the start is to make sure that the raw string is considered. substituteString = re.sub(r'#.*$', "", inputString) print("Substituted String :: ", substituteString) print("\n") #Replace everything other than digits onlyNumbers = re.sub(r'\D', "", inputString) print("Replace everything except numbers : ", onlyNumbers) print("\n") newString = "The lady doth protest too much, methinks.The better part of valor is discretion.The course of true love never did run smooth." #FindAll using Special sequences startOfString = re.findall("\AThe", newString) print("findall with only at the start special sequences: ", startOfString) print("\n") #FindAll if not at start or at end. Will not return a result since the is at the start. neitherStartnorEnd = re.findall('\BThe', newString) print("find all Not at Start or end, no output: ", neitherStartnorEnd) print("\n") newInput = "Lord, what fools these mortals be!.The fault, dear Brutus, lies not within the stars, but in ourselves, that we are underlings." #This will return a list where 'es' is neither at start or the end neitherStartnorEnd1 = re.findall(r"\Bes", newInput) print("Not at Start or end: ", neitherStartnorEnd1) print("\n") #Match function testString = 'Brevity is the soul of wit.' #This will not return any result becase the pattern will check only for lowercase characters matchResult = re.match('^b...y$', testString) print("Match function output: ", matchResult) print("\n") #To match the pattern without case, we do matchResultIgnoreCase = re.match("^b\w+", testString,flags=re.I) print("Match function without case", matchResultIgnoreCase) print("\n")
8. Summary
In this article, we looked at regex support that Python provides. Python also has a third-party module called regex which is available to download here.
9. More articles
- Python Tutorial for Beginners
- Python Random Module Tutorial
- Python input() method Tutorial
- Queue in Python
- Python JSON Example
- Introduction to the Flask Python Web App Framework
- Selenium with Python Example
10. Download the Source Code
Above we saw examples of using regex in Python.
You can download the full source code of this example here: Python RegEx Tutorial
Last updated on May 18th, 2021