Google Sheets Function: REGEXMATCH (2/3)
Quantity of Each Character
Quantifiers specify how many times the same character can appear. To specify a quantity or a range, add {}
after the concerned character.
For example:
- {2}: the character must appear 2 times
- {3,}: the character must appear 3 times or more
- {1,4}: the character must appear between 1 and 4 times
The regex "composer{0,}"
checks here if the text contains this word with none, one or multiple "r":
There are also 3 shortcuts for quantifying a character:
- *: the character must appear 0 times or more (equivalent to
{0,}
) - +: the character must appear 1 time or more (equivalent to
{1,}
) - ?: the character must appear between 0 and 1 time (equivalent to
{0,1}
)
The regex "composer*"
is thus identical to the previous one "composer{0,}"
:
Testing a Structure
To test if the structure of a text (like a reference, phone number, URL, etc.) matches what is expected, you will need to use several elements seen so far in a single expression.
For example, to validate the regex "^SP-[0-9]{3}-"
the text must:
- ^SP-: begin with "SP-"
- [0-9]{3}: be followed by 3 characters in the range of 0 to 9
- -: be followed by a "-"
Special Characters
Metacharacters are special characters with a specific role (you're already familiar with ^ $ [ ] | ? * + { }
).
To negate the effect of a metacharacter and treat it as a simple character, precede it with a \
(except when the metacharacter is within a character class).
Thanks to \
, the regex " \?$"
can check if the text ends with " ?":
\ ^ $ . [ ] | ( ) ? * + { }
.\t
stands for a tabulation, \r
for a carriage return, and \n
for a newline.Class Shortcuts
As you now know, character classes define which characters are allowed or not. There are some practical shortcuts to simplify this:
- \d: a digit (equivalent to
[0-9]
) - \D: anything but a digit (equivalent to
[^0-9]
) - \w: an alphanumeric character or an "_" (equivalent to
[a-zA-Z0-9_]
) - \W: anything but an alphanumeric character or an "_" (equivalent to
[^a-zA-Z0-9_]
) - \s: a whitespace character, namely a space, newline, carriage return, or tabulation
[ \n\r\t]
- \S: anything but a whitespace character (equivalent to
[^ \n\r\t]
) - .: any character except newline (equivalent to
[^\n]
)
The regex "^[A-Z]{2}.\w{3}$"
verifies here if the text starts with 2 characters from the A-Z range, followed by any character (except newline), followed (and ending with) 3 alphanumeric characters:
Case-Insensitive Search
To search for the word "sheets" without considering uppercase or lowercase, add (?i)
at the beginning of the regex "(?i)sheets"
: