10 Literal Characters
We’re going to start with the simplest match of all: a literal character.
A literal character match is one in which a given character such as the letter
"R"
matches the letter R. This type of match is the most basic
type of regular expression operation: just matching plain text.
10.1 Matching Literal Characters
The following examples are extremely basic but they will help you get a good understanding of regex.
Consider the following text stored in a character vector this_book
:
The first regular expression we are going to work with is "book"
.
This pattern is formed by a letter b, followed by a letter o, followed by
another letter o, followed by a letter k. As you may guess, this pattern
matches the word book in the character vector this_book
.
To have a visual representation of the actual pattern that is matched, you
should use the function str_view()
from the package "stringr"
(you may need to upgrade to a recent version of RStudio):
As you can tell, the pattern "book"
doesn’t match the entire content in
the vector this_book
; it just matches those four letters.
It may seem really simple but there are a couple of details to be highlighted.
The first is that regex searches are case sensitive by default. This means
that the pattern "Book"
would not match book in this_book
.
You can change the matching task so that it is case insensitive but we will talk about it later.
Let’s add more text to this_book
:
Let’s use str_view()
to see what pieces of text are matched in this_book
with the pattern "book"
:
As you can tell, only the first occurrence of book was matched. This is a common behavior of regular expressions in which they return a match as fast possible. You can think of this behavior as the “eager principle”, that is, regular expressions are eager and they will give preference to an early match. This is a minor but important detail and we will come back to this behavior of regular expressions.
All the letters and digits in the English alphabet are considered literal characters. They are called literal because they match themselves.
Here is another example:
The first pattern to test is the letter "a"
:
When you execute the previous command, you should be able to see that the
letter "a"
is highlighted in the words car, boat and airplane.
Make a donation
If you find this resource useful, please consider making a one-time donation in any amount. Your support really matters.