4 C-style Formatting

R comes with the sprintf() function that provides string formatting like in the C language. To be more precise, this function is a wrapper for the C library function of the same name. In many other programming languages, this type of printing is known as printf which stands for print formatting. Simply put, sprintf() allows you to create strings as output using formatted data.

The function sprintf() requires using a special syntax that may look awkward the first time you use it. Here is one example:

sprintf("I woke up at %s:%s%s a.m.", 8, 0, 5)
#> [1] "I woke up at 8:05 a.m."

How does sprintf() work? The first argument of this function is a character vector of one element that contains the text to be formatted. Observe that inside the text there are various percent symbols % followed by the letter s. Each % is referred to as a slot, which is basically a placeholder for a variable that will be formatted. The rest of the inputs passed to sprintf() are the values that will be used in each of the slots.

The string in the previous example contains three slots of the same type, %s, and the subsequent arguments are numbers 8, 0, and 5. Each number is used as a value for each slot. The letter s indicates that the formatted variable is specified as a string.

Most of the times you won’t use sprintf() like in the example above. Instead, what you will pass are variables containing different values:

hour <- 8
mins1 <- 0
mins2 <- 5
sprintf("I woke up at %s:%s%s a.m.", hour, mins1, mins2)
#> [1] "I woke up at 8:05 a.m."

4.1 C-style formatting options

The string format %s is just one of a larger list of available formatting options. The following table shows the most common formatting specifications:

Notation Description
%s a string
%d an integer
%0xd an integer padded with x leading zeros
%f decimal notation with six decimals
%.xf floating point number with x digits after decimal point
%e compact scientific notation, e in the exponent
%E compact scientific notation, E in the exponent
%g compact decimal or scientific notation (with e)

4.1.1 Format Slot Syntax

The full syntax for a format slot is defined by:

%[parameter][flags][width][.precision][length]type

The percent symbol, %, as we said, indicates a placeholder or slot.

The parameter field is an optional field that can take the value n$ in which n is the number of the variable to display, allowing the variables provided to be used multiple times, using varying format specifiers or in different orders.

sprintf("The second number is %2$d, the first number is %1$d", 2, 1)
#> [1] "The second number is 1, the first number is 2"

The flags field can be zero or more (in any order) of:

  • - (minus) Left-align the output of this placeholder.
  • + (plus) Prepends a plus for positive signed-numeric types.
  • ' ' (space) Prepends a space for positive signed-numeric types.
  • 0 (zero) When the ‘width’ option is specified, prepends zeros for numeric types.
  • # (hash) Alternate form:
    • for g and G types, trailing zeros are not removed.
    • for f, F, e, E, g, G types, the output always contain a decimal point.
    • for o, x, X types, the text 0, 0x, 0X, respectively, is prepended to non-zero numbers.

The width field is an optional field that you use to specify a minimum number of characters to output, and is typically used to pad fixed-width fields in tabulated output, where the fields would otherwise be smaller, although it does not cause truncation of oversized fields.

sprintf("%*d", 5, 10)
#> [1] "   10"

The precision field usually specifies a maximum limit on the output, depending on the particular formatting type.

sprintf("%.*s", 3, "abcdef")
#> [1] "abc"

The length field is also optional, and can be any of:

The most important field is the type field.

  • %: Prints a literal % character (this type doesn’t accept any flags, width, precision, length fields).
  • d, i: integer value as signed decimal number.
  • f: double value in normal fixed-point notation.
  • e, E: double value in standard form.
  • g, G: double value in either normal or exponential notation.
  • x, X: unsigned integer as a hexadecimal number. x uses lower case, while X uses upper case.
  • o: unsigned integer in octal notation.
  • s: null terminated string.
  • a, A: double value in hexadecimal notation

4.1.2 Example: basic sprintf()

Let’s begin with a minimal example to explore the different formatting options of sprintf(). Consider a real fraction like 1/6; in R the default output of this fraction will be:

1 / 6
#> [1] 0.167

Notice that 1/6 is printed with seven decimal digits. The number 1/6 is actually an irrational number and so the computer needs to round it to some number of decimal digits. You can modify the default printing format in several ways. One option is to display only six decimal digits with the %f option:

# print 6 decimals
sprintf('%f', 1/6)
#> [1] "0.166667"

But you can also specify a different number of decimal digits, say 3. This can be achieved specifying an option of %.3f:

# print 3 decimals
sprintf('%.3f', 1/6)
#> [1] "0.167"

The table below shows six different outputs for 1/6

Notation Output
%s 0.166666666666667
%f 0.166667
%.3f 0.167
%e 1.666667e-01
%E 1.666667E-01
%g 0.166667

When would you use sprintf()? Everytime you produce output text. Some cases include:

  1. exporting output to some file.
  2. printing output on console.
  3. forming new strings.

4.1.3 Example: File Names

When working on data analysis projects, it is common to generate different files with similar names (e.g. either for creating images, or data files, or documents). Imagine that you need to generate the names of 3 data files (with .csv extension). All the files have the same prefix name but each of them has a different number: data01.csv, data02.csv, and data03.csv. One naive solution to generate a character vector with these names in R would be to write something like this:

file_names <- c('data01.csv', 'data02.csv', 'data03.csv')

Instead of writing each file name, you can generate the vector file_names in a more efficient way taking advantage of the vectorized nature of paste0():

file_names <- paste0('data0', 1:3, '.csv')

file_names
#> [1] "data01.csv" "data02.csv" "data03.csv"

Now imagine that you need to generate 100 file names numbered from 01, 02, 03, to 100. You could write a vector with 100 file names but it’s going to take you a while. A preferable solution is to use paste0() like in the approach of the previous example. In this case however, you would need to create two separate vectors—one with numbers 01 to 09, and another one with numbers 10 to 100—and then concatenate them in one single vector:

files1 <- paste0('data0', 1:9, '.csv')
files2 <- paste0('data', 10:100, '.csv')
file_names <- c(files1, files2)

Instead of using paste0() to create two vectors, you can use sprintf() with the %0xd option to indicate that an integer should be padded with x leading zeros. For instance, the first nine file names can be generated as:

sprintf('data%02d.csv', 1:9)
#> [1] "data01.csv" "data02.csv" "data03.csv" "data04.csv" "data05.csv"
#> [6] "data06.csv" "data07.csv" "data08.csv" "data09.csv"

To generate the 100 file names do:

file_names <- sprintf('data%02d.csv', 1:100)

The first nine elements in file_names will include a leading zero before the integer; the following elements will not include the leading zero.

4.1.4 Example: Fahrenheit to Celsius

This example involves working on a function to convert Fahrenheit degrees into Celsius degrees. The conversion formula is:

\[ Celsius = (Fahrenheit - 32) \times \frac{5}{9} \]

You can define a simple function to_celsius() that takes one argument, temp, which is a number representing temperature in Fahrenheit degrees. This function will return the temperature in Celsius degrees:

to_celsius <- function(temp = 1) {
  (temp - 32) * 5/9
}

You can use to_celsius() as any other function in R. Say you want to know how many Celsius degrees are 95 Fahrenheit degrees:

to_celsius(95)
#> [1] 35

To make things more interesting, let’s create another function that not only computes the temperature conversion but also prints a more informative message, something like: 95 Fahrenheit degrees = 35 Celsius degrees.

We’ll name this function fahrenheit2celsius():

fahrenheit2celsius <- function(temp = 1) {
  celsius <- to_celsius(temp)
  sprintf('%.2f Fahrenheit degrees = %.2f Celsius degrees', temp, celsius)
}

Notice that fahrenheit2celsius() makes use of to_celsius() to compute the Celsius degrees. And then sprintf() is used with the options %.2f to display the temperatures with two decimal digits. Try it out:

fahrenheit2celsius(95)
#> [1] "95.00 Fahrenheit degrees = 35.00 Celsius degrees"
fahrenheit2celsius(50)
#> [1] "50.00 Fahrenheit degrees = 10.00 Celsius degrees"

4.1.5 Example: Car Traveled Distance

Our third example is a little bit more sophisticated. The idea is to construct an object of class "car" that contains characteristics like the name of the car, its make, its year, and its fuel consumption in city, highway and combined.

Let’s consider a Mazda 3 for this example. One possible way to define a "car" object is to use a list with the following elements:

mazda3 <- list(
  name = 'mazda3', # car name
  make = 'mazda',  # car make
  year = 2015,     # year model
  city = 30,       # fuel consumption in city
  highway = 40,    # fuel consumption in highway
  combined = 33)   # fuel consumption combined (city-and-hwy)

So far we have an object mazda3 that is essentially a list. Because we want to create a print() method for objects of class "car" we need to assign this class to our mazda3:

class(mazda3) <- "car"

Now that we have our "car" object, we can create a print.car() function. In this way, everytime we type mazda3, instead of getting the typical list output, we will get a customized display:

print.car <- function(x) {
  cat("Car\n")
  cat(sprintf('name: %s\n', x$name))
  cat(sprintf('make: %s\n', x$make))
  cat(sprintf('year: %s\n', x$year))
  invisible(x)
}

Next time you type mazda3 in your console, R will display these lines:

mazda3
#> Car
#> name: mazda3
#> make: mazda
#> year: 2015

It would be nice to have a function miles() that allows you to calculate the traveled distance for a given amount of gas (in gallons), taking into account the type of fuel consumption (e.g. city, highway, combined):

miles <- function(car, fuel = 1, mpg = 'city') {
  stopifnot(class(car) == 'car')
  switch(mpg,
         'city' = car$city * fuel,
         'highway' = car$highway * fuel,
         'combined' = car$combined * fuel,
         car$city * fuel)
}

The miles() function takes three parameters: car is an object of class "car", fuel is the number of gallons, and mpg is the type of fuel consumption ('city',highway,combined’). The first command checks whether the first parameter is an object of class“car”. If it is not, then the function will stop the execution raising an error. The second command involves using the functionswitch()to compute the traveled miles. It switches to the corresponding consumption depending on the provided value ofmpg. Note that the very last switch condition is a _safety_ condition in case the user mispecifiesmpg`.

Let’s say you want to know how many miles the mazda3 could travel with 4 gallons of gas depending on the different types of consumption:

miles(mazda3, fuel = 4, 'city')
#> [1] 120
miles(mazda3, fuel = 4, 'highway')
#> [1] 160
miles(mazda3, fuel = 4, 'combined')
#> [1] 132

Again, to make things more user friendly, we are going to create a function get_distance() that prints a more informative message about the traveled distance:

get_distance <- function(car, fuel = 1, mpg = 'city') {
  distance <- miles(car, fuel = fuel, mpg = mpg) 
  cat(sprintf('A %s can travel %s miles\n',
              car$name, distance))
  cat(sprintf('with %s gallons of gas\n', fuel))
  cat(sprintf('using %s consumption', mpg))
}

And this is how the output when calling get_distance looks like:

get_distance(mazda3, 4, 'city')
#> A mazda3 can travel 120 miles
#> with 4 gallons of gas
#> using city consumption

4.1.6 Example: Coffee Prices

Consider some coffee drinks and their prices. We’ll put this information in a vector like this:

prices <- c(
  'americano' = 2, 
  'latte' = 2.75, 
  'mocha' = 3.45, 
  'capuccino' = 3.25)

What type of vector is prices? Is it a character vector? Is it numeric vector? Or is it some sort of vector with mix-data? We have seen that vectors are atomic structures, meaning that all their elements must be of the same class. So prices is definitely not a vector with mix-data. From the code chunk we can observe that each element of the vector is formed by a string, followed by the = sign, followed by some number. This way of defining a vector is not very common in R but it is perfectly valid. Each string represents the name of an element, while the numbers are the actual elements. Therefore prices is in reality a numeric vector. You can confirm this by looking at the mode (or data type):

mode(prices)
#> [1] "numeric"

Let’s say you want to list the names of the coffees and their prices. If you just simply try to print() the prices, the output will be the entire vector prices:

print(prices)
#> americano     latte     mocha capuccino 
#>      2.00      2.75      3.45      3.25

Alternatively, you can use a for loop to print() each individual element of the vector prices, but again the output is displayed in an awkward fashion:

for (p in seq_along(prices)) {
  print(prices[p])
}
#> americano 
#>         2 
#> latte 
#>  2.75 
#> mocha 
#>  3.45 
#> capuccino 
#>      3.25

To list the names of the coffees and their prices, it would be nicer to use a combination of paste0() and print(). In addition, you can be more descriptive adding some auxiliary text such that the output prints something like: “americano has a price of $2”.

for (p in seq_along(prices)) {
  print(paste0(names(prices)[p], ' has price of $', prices[p]))
}
#> [1] "americano has price of $2"
#> [1] "latte has price of $2.75"
#> [1] "mocha has price of $3.45"
#> [1] "capuccino has price of $3.25"

Another possible solution consists of combining print() and sprintf():

for (p in seq_along(prices)) {
  print(sprintf('%s has price of $%s', names(prices)[p], prices[p]))
}
#> [1] "americano has price of $2"
#> [1] "latte has price of $2.75"
#> [1] "mocha has price of $3.45"
#> [1] "capuccino has price of $3.25"

One limitation of quote() is that it won’t work inside a for loop:

for (p in seq_along(prices)) {
  noquote(sprintf('%s has price of $%s', names(prices)[p], prices[p]))
}

If what you want is to print the output wihtout quotes, then you need to use cat(); just make sure to add a newline character "\n":

for (p in seq_along(prices)) {
  cat(sprintf('%s has price of $%s\n', names(prices)[p], prices[p]))
}
#> americano has price of $2
#> latte has price of $2.75
#> mocha has price of $3.45
#> capuccino has price of $3.25

Make a donation

If you find this resource useful, please consider making a one-time donation in any amount. Your support really matters.