10 Literal Characters
We’re going to start with the simplest match of all: a literal character.
A literal character match is one in which a given character such as the letter
"R" matches the letter R. This type of match is the most basic
type of regular expression operation: just matching plain text.
10.1 Matching Literal Characters
The following examples are extremely basic but they will help you get a good understanding of regex.
Consider the following text stored in a character vector
The first regular expression we are going to work with is
This pattern is formed by a letter b, followed by a letter o, followed by
another letter o, followed by a letter k. As you may guess, this pattern
matches the word book in the character vector
To have a visual representation of the actual pattern that is matched, you
should use the function
str_view() from the package
(you may need to upgrade to a recent version of RStudio):
As you can tell, the pattern
"book" doesn’t match the entire content in
this_book; it just matches those four letters.
It may seem really simple but there are a couple of details to be highlighted.
The first is that regex searches are case sensitive by default. This means
that the pattern
"Book" would not match book in
You can change the matching task so that it is case insensitive but we will talk about it later.
Let’s add more text to
str_view() to see what pieces of text are matched in
with the pattern
As you can tell, only the first occurrence of book was matched. This is a common behavior of regular expressions in which they return a match as fast possible. You can think of this behavior as the “eager principle”, that is, regular expressions are eager and they will give preference to an early match. This is a minor but important detail and we will come back to this behavior of regular expressions.
All the letters and digits in the English alphabet are considered literal characters. They are called literal because they match themselves.
Here is another example:
The first pattern to test is the letter
When you execute the previous command, you should be able to see that the
"a" is highlighted in the words car, boat and airplane.