<img width="50" height="50" alt="front-page port 80-shoopyu" src="https://tryhackme-images.s3.amazonaws.com/room-icons/cddd3c214db8ff7df5d524af3cfad5e4.png"> Regular Expressions| Tryhackme |

Introduction

Regular expressions (or Regex) are patterns of text that you define to search documents and match exactly what you’re looking for.

Charsets

A charset is defined by enclosing in [ square brackets ] the character(s), or range of characters that we want to match. Then, it finds every occurrence of the pattern we have defined in the file/text we are searching.

To exclude certain characters from a charset, we use the ^ hat symbol. For example [^k]ing will match ring, sing, $ing, but not king.

If we want to exclude charsets, not just single characters, we can do it just like this : [^a-c]at will match fat and hat, but not bat or cat.

Match all of the following characters: c, o, g

[cog]

Match all of the following words: cat, fat, hat

[cfh]at

Match all of the following words: Cat, cat, Hat, hat

[cChH]at

Match all of the following filenames: File1, File2, file3, file4, file5, File7, file9

[fF]ile[1-9]

Match all of the filenames of question 4, except “File7”

[fF]ile[^7]

Wildcards and optional characters

The wildcard that is used to match any single character (except the line break) is the . dot. That means that a.c will match aac, abc, a0c, a!c, and so on.

Also, we can set a character as optional in our pattern using the ? question mark. That means that abc? will match ab and abc, since the c is optional.

If we want to search for . a literal dot, we have to escape it with a \ reverse slash. That means that a.c will match a.c, but also abc, a@c, and so on. But a\.c will match just a.c.

Match all of the following words: Cat, fat, hat, rat

.at

Match all of the following words: Cat, cats

[cC]ats?

Match the following domain name: cat.xyz

cat\.xyz

Match all of the following domain names: cat.xyz, cats.xyz, hats.xyz

[ch]ats?\.xyz

Match every 4-letter string that doesn’t end in any letter from n to z

...[^n-z]

Match bat, bats, hat, hats, but not rat or rats

[^r]ats?

Metacharacters and repetitions

There are easier ways to match bigger charsets. For example, \d is used to match any single digit. Here’s a reference:
\d matches a digit, like 9
\D matches a non-digit, like A or @
\w matches an alphanumeric character, like a or 3
\W matches a non-alphanumeric character, like ! or #
\s matches a whitespace character (spaces, tabs, and line breaks)
\S matches everything else (alphanumeric characters and symbols)

Underscores _ are included in the \w metacharacter and not in \W. That means that \w will match every single character in test_file.

Often we want a pattern that matches many characters of a single type in a row, and we can do that with repetitions. For example, {2} is used to match the preceding character (or metacharacter, or charset) two times in a row. That means that z{2} will match exactly zz.

Here’s a reference for each repetition along with how many times it matches the preceding pattern:

{12} - exactly 12 times.
{1,5} - 1 to 5 times.
{2,} - 2 or more times.
* - 0 or more times.
+ - 1 or more times.

Match the following word: catssss

cats{4}

Match all of the following words (use the * sign): Cat, cats, catsss

[cC]ats*

Match all of the following sentences (use the + sign): regex go br, regex go brrrrrr

regex go br+

Match all of the following filenames: ab0001, bb0000, abc1000, cba0110, c0000 (don’t use a metacharacter)

[abc]{1,3}[01]{4}

Match all of the following filenames: File01, File2, file12, File20, File99

[fF]ile\d{1,2}

Match all of the following folder names: kali tools, kali tools

kali\s+tools

Match all of the following filenames: notes~, stuff@, gtfob#, lmaoo!

\w{5}\W

Match the string in quotes (use the * sign and the \s, \S metacharacters): “2f0h@f0j0%! a)K!F49h!FFOK”

\S*\s*\S*

Match every 9-character string (with letters, numbers, and symbols) that doesn’t end in a “!” sign

\S{8}[^!]

Match all of these filenames (use the + symbol): .bash_rc, .unnecessarily_long_filename, and note1

\.?\w+

Starts with/ ends with, groups, and either/ or

Sometimes it’s very useful to specify that we want to search by a certain pattern in the beginning or the end of a line. We do that with these characters:
^ - starts with
$ - ends with

So for example, if we want to search for a line that starts with abc, we can use ^abc.
If you want to search for a line that ends with xyz, you can use xyz$.

Note that the ^ hat symbol is used to exclude a charset when enclosed in [square brackets], but when it is not, it is used to specify the beginning of a word.

We can also define groups by enclosing a pattern in (parentheses). We will use it to define an either/ or pattern, and also to repeat patterns. To say “or” in Regex, we use the | pipe.

For an “either/or” pattern example, the pattern during the (day|night) will match both of these sentences: during the day and during the night.

Match every string that starts with “Password:” followed by any 10 characters excluding “0”

^Password:[^0]{10}

Match “username: “ in the beginning of a line (note the space!)

^username:\s

Match every line that doesn’t start with a digit (use a metacharacter)

^\d

Match this string at the end of a line: EOF$

EOF\$$

Match all of the following sentences:

I use nano
I use vim
I use (nano|vim)

Match all lines that start with $, followed by any single digit, followed by $, followed by one or more non-whitespace characters

\$\d\$\S+

Match every possible IPv4 IP address (use metacharacters and groups)

(\d{1,3}\.){3}\d{1,3}

Match all of these emails while also adding the username and the domain name (not the TLD) in separate groups (use \w): hello@tryhackme.com, username@domain.com, dummy_email@xyz.com

(\w+)@(\w+)\.com

Regular Expressions| Tryhackme |

Introduction

Charsets

Wildcards and optional characters

Metacharacters and repetitions

Starts with/ ends with, groups, and either/ or

Further Reading

What the Shell? | Tryhackme

Simple CTF| Tryhackme | Easy

Work From Home| Tryhackme | Easy