What are regular expressions?

Regular expressions, also known as "regex" or "regexp", are a Sequence of characters defining a search pattern. They are used to match and extract text from a larger corpus of text and are often used in programming, the Data mining and used to manipulate and extract text. When working with text, RegEx can be used for data validation, search and replace and text analysis, among other things.

Character selection and character classes in RegEx

SymbolExplanation
[abc]finds every single character in the set (a, b or c)
[^abc]finds every single character that is not contained in the set (a, b or c)
{n}Matches exactly n occurrences of the preceding character or group
{n,}Matches n or more occurrences of the preceding character or group
{n,m}Corresponds to at least n and at most m occurrences of the preceding character or group
^corresponds to the beginning of a line
$Corresponds to the end of a line
.fits on every single character, except on a new line
*matches zero or more of the preceding characters
+Corresponds to one or more of the preceding characters
?corresponds to zero or one of the preceding characters
\dcorresponds to any digit (corresponds to [0-9])
\DFits any non-digit
\wfits any word character (alphanumeric characters and underscores)
\WCorresponds to any non-word character
\scorresponds to any space character (including tabs and spaces)
\SMatches any character that is not a space
|Corresponds to either the preceding or following character or group
()Groups the enclosed characters and applies the following quantifier to the whole group

Examples of regular expressions from practice

Validation of e-mail addresses

A regular expression can be used to check whether a given string is a valid email address by comparing it to a pattern that defines the structure of a valid email address.

Example syntax:

^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

Parsing URLs

A regular expression can be used to extract the different parts of a URL, e.g. the protocol, the host name and the path.

Example syntax:

^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$

Extracting phone numbers from text

A regular expression can be used to identify and extract phone numbers from a block of text.

Example syntax:

^(?:\+\d{1,3}|0\d{1,3}|00\d{1,2})?(?:\s?\d){9,12}$

Removing HTML tags from a string

A regular expression can be used to remove all HTML tags from a string, leaving only the plain text content.

Example syntax:

\/?[^>]+

Search for specific words or patterns in a character string

A regular expression can be used to quickly search for a specific word or pattern in a string.

Example syntax:

\b(Word1|Word2|Word3)\b

Replace text

A regular expression can be used to replace all occurrences of a particular word or pattern in a string with another word or pattern.

Example syntax:

(Word1|Word2|Word3)

Extract data from structured files

A regular expression can be used to extract specific data from structured files such as CSV, JSON and logs.

Example syntax:

(\w+)=(\d+)

Validation of credit card numbers

A regular expression can be used to ensure that a given string is a valid credit card number by comparing it to a pattern that defines the structure of a valid credit card number.

Example syntax:

^(?:4[0-9]{12}(?:[0-9]{3})?|[25][1-7][0-9]{14}|6(?:011|5[0-9][0-9])[0-9]{12}|3[47][0-9]{13}|3(?:0[0-5]|[68][0-9])[0-9]{11}|(?:2131|1800|35\d{3})\d{11})$

Tokenisation of a sentence

Regular expressions can be used to break down a sentence into words and punctuation marks.

Example syntax:

\w+|[^\w\s]+

Data extraction from natural language text

A regular expression can be used to extract specific information such as names, dates, prices, etc. from natural text.

Example syntax:

\b