LFCS ADMIN EXAM PREPARATION GUIDE – Analyze a text using basic regular expressions

LFCS ADMIN EXAM PREPARATION GUIDE – Analyze a text using basic regular expressions

LFCS Admin Exam preparation guide series, main page can be found here.

This post is part of Essential Commands from the domain competency list for the exam. The full list can be found in the link above paragraph or the Linux Foundation page here.

Regex -regular expression is a specific series of characters used as a search pattern. We can use to “find all” or if we know only part of what we are looking for.

.Replace any character
^Match start of a string (shift +6)
$Match end of a string (shift +4)
|Match a specific or group of characters on either side
\Used to escape special character
?Matches up exactly one character
{n}Match preceding item exactly n times
Regular expression examples

Let us try using regular expressions to find some text in the file. First, let us try to find lines which are starting with ‘The’. This is how the command will look like – grep ‘^The ‘ intro-linux.txt. As we can see in the example below we got all the lines starting from ‘The’

grep command with regular expressions
grep command with regular expressions

Let us try something different. We want to find any line which starts with “T” but does not end with “e” The command will look like this: grep ‘^T[a-z][^e]’ intro-linux.txt. Here we can see we are using the begin symbol follow by uppercase “T” followed by any character from a-z, and next, we put the begin symbol which here will mean is not and letter “e”. And we can see that we get many different lines starting with “T” but not the “The”

But what if we like to find something more specific like let’s say email address, we can use a regular expression to do just that. We start with the grep follow by -E which means extended regular expressions, next -o to show only lines which match. Next, we put “\b[A-Za-z0-9] which means we want to find any word containing uppercase letters A-Z or lowercase letters a-z and numbers 0-9, that following by plus sing and @ and again any letters or number +@[A-Za-z0-9]. Next, we put an escape symbol \ follow by a dot and again follow by any uppercase or lowercase letters A-Z and numbers 0-9 we add {2,6} which means that there should be 2 to 6 letters at the end of our email. Full command will looks like : grep -E -o “\b[A-Za-z0-9]+@[A-za-z0-9]+.[A-za-z]{2,5}\b” TKYUsers.csv and we can see that we get all email addresses from the file.

email address regular expression

As we can see regular expression can get very complicated, the only way to learn them is to practice a lot.

Thank you for reading.