10 Literal Characters

We’re going to start with the simplest match of all: a literal character. A literal character match is one in which a given character such as the letter "R" matches the letter R. This type of match is the most basic type of regular expression operation: just matching plain text.

10.1 Matching Literal Characters

The following examples are extremely basic but they will help you get a good understanding of regex.

Consider the following text stored in a character vector this_book:

The first regular expression we are going to work with is "book". This pattern is formed by a letter b, followed by a letter o, followed by another letter o, followed by a letter k. As you may guess, this pattern matches the word book in the character vector this_book. To have a visual representation of the actual pattern that is matched, you should use the function str_view() from the package "stringr" (you may need to upgrade to a recent version of RStudio):

As you can tell, the pattern "book" doesn’t match the entire content in the vector this_book; it just matches those four letters.

It may seem really simple but there are a couple of details to be highlighted. The first is that regex searches are case sensitive by default. This means that the pattern "Book" would not match book in this_book.

You can change the matching task so that it is case insensitive but we will talk about it later.

Let’s add more text to this_book:

Let’s use str_view() to see what pieces of text are matched in this_book with the pattern "book":

As you can tell, only the first occurrence of book was matched. This is a common behavior of regular expressions in which they return a match as fast possible. You can think of this behavior as the “eager principle”, that is, regular expressions are eager and they will give preference to an early match. This is a minor but important detail and we will come back to this behavior of regular expressions.

All the letters and digits in the English alphabet are considered literal characters. They are called literal because they match themselves.

Here is another example:

The first pattern to test is the letter "a":

When you execute the previous command, you should be able to see that the letter "a" is highlighted in the words car, boat and airplane.