18 More About Functions

In this chapter you will learn more aspects about creating functions in R.

18.1 Functions Recap

Consider a toy example with a function that squares its argument:

square = function(x) {
  x * x
}
  • the function name is "square"
  • it has one argument: x
  • the function body consists of one simple expression
  • it returns the value x * x

square() works like any other function in R:

square(10)
> [1] 100

In this case, square() is also vectorized:

square(1:5)
> [1]  1  4  9 16 25

Why is square() vectorized?

Once defined, functions can be used in other function definitions:

sum_of_squares = function(x) {
  sum(square(x))
}
sum_of_squares(1:5)
> [1] 55

18.1.1 Simple Expressions

Functions with a body consisting of a simple expression can be written with no braces (in one single line!):

square = function(x) x * x
square(10)
> [1] 100

However, as a general coding rule, you should get into the habit of writing functions using braces.

18.1.2 Nested Functions

We can also define a function inside another function:

getmax = function(a) {
  # nested function
  maxpos <- function(u) which.max(u) 
  # output
  list(position = maxpos(a),
       value = max(a))
}
getmax(c(2, -4, 6, 10, pi))
> $position
> [1] 4
> 
> $value
> [1] 10

18.2 Function Output

The value of a function can be established in two ways:

  • As the last evaluated simple expression (in the body of the function)
  • An explicitly returned value via return()

Here’s a basic example of a function in which the output is the last evaluated expression:

add = function(x, y) {
  x + y
}

add(2, 3)
> [1] 5

Here’s another version of add() in which the output is the last evaluated expression:

add = function(x, y) {
  z = x + y
  z
}

add(2, 3)
> [1] 5

Be careful with the form in which the last expression is evaluated:

add = function(x, y) {
  z = x + y
}

add(2, 3)

In this case, it looks like add() does not work. If you run the previous code, nothing appears in the console. Can you guess why? To help you answer this question, assign the invocation to an object and then print the object:

why <- add(2, 3)
why

add() does work. The issue has to do with the form of the last expression. Nothing gets displayed in the console because the last statement z <- x + y is an assignment (that does not print anything).

18.2.1 The return() command

More often than not, the return() command is included to explicitly indicate the output of a function:

add = function(x, y) {
  z <- x + y
  return(z)
}

add(2, 3)
> [1] 5

I’ve seen that many users with previous programming experience in other languages prefer to use return(). The main reason is that most programming languages tend to use some sort of return statement to indicate the output of a function.

So, following good language-agnostic coding practices, we also recommend that you use the function return(). In this way, any reader can quickly scan the body of your functions and visually locate the places in which a return statement is being made.

18.2.2 White Spaces

  • Use a lot of it
  • around operators (assignment and arithmetic)
  • between function arguments and list elements
  • between matrix/array indices, in particular for missing indices
  • Split long lines at meaningful places

Avoid this

a<-2
x<-3
y<-log(sqrt(x))
3*x^7-pi*x/(y-a)

Much Better

a <- 2
x <- 3
y <- log(sqrt(x))
3*x^7 - pi * x / (y - a)

Another example:

# Avoid this
plot(x,y,col=rgb(0.5,0.7,0.4),pch='+',cex=5)

# okay
plot(x, y, col = rgb(0.5, 0.7, 0.4), pch = '+', cex = 5)

Another readability recommendation is to limit the width of line: they should be broken/wrapped around so that they are less than 80 columns wide

# lines too long
histogram <- function(data){
hist(data, col = 'gray90', xlab = 'x', ylab = 'Frequency', main = 'Histogram of x')
abline(v = c(min(data), max(data), median(data), mean(data)),
col = c('gray30', 'gray30', 'orange', 'tomato'), lty = c(2,2,1,1), lwd = 3)
}

Lines should be broken/wrapped around so that they are less than 80 columns wide

# lines with okay width
histogram <- function(data) {
  hist(data, col = 'gray90', xlab = 'x', ylab = 'Frequency', 
       main = 'Histogram of x')
  abline(v = c(min(data), max(data), median(data), mean(data)),
         col = c('gray30', 'gray30', 'orange', 'tomato'), 
         lty = c(2,2,1,1), lwd = 3)
}
  • Spacing forms the second important part in code indentation and formatting.
  • Spacing makes the code more readable
  • Follow proper spacing through out your coding
  • Use spacing consistently
# this can be improved
stats <- c(min(x), max(x), max(x)-min(x),
  quantile(x, probs=0.25), quantile(x, probs=0.75), IQR(x),
  median(x), mean(x), sd(x)
)

Don’t be afraid of splitting one long line into individual pieces:

# much better
stats <- c(
  min(x), 
  max(x), 
  max(x) - min(x),
  quantile(x, probs = 0.25),
  quantile(x, probs = 0.75),
  IQR(x),
  median(x), 
  mean(x), 
  sd(x)
)

You can even do this:

# also OK
stats <- c(
  min    = min(x), 
  max    = max(x), 
  range  = max(x) - min(x),
  q1     = quantile(x, probs = 0.25),
  q3     = quantile(x, probs = 0.75),
  iqr    = IQR(x),
  median = median(x), 
  mean   = mean(x), 
  stdev  = sd(x)
)
  • All commas and semicolons must be followed by single whitespace
  • All binary operators should maintain a space on either side of the operator
  • Left parenthesis should start immediately after a function name
  • All keywords like if, while, for, repeat should be followed by a single space.

All binary operators should maintain a space on either side of the operator

# NOT Recommended 
a=b-c
a = b-c
a=b - c; 

# Recommended 
a = b - c

All binary operators should maintain a space on either side of the operator

# Not really recommended 
z <- 6*x + 9*y

# Recommended (option 1)
z <- 6 * x + 9 * y

# Recommended (option 2)
z <- (7 * x) + (9 * y)

Left parenthesis should start immediately after a function name

# NOT Recommended 
read.table ('data.csv', header = TRUE, row.names = 1)

# Recommended 
read.table('data.csv', header = TRUE, row.names = 1)

All keywords like if, while, for, repeat should be followed by a single space.

# not bad
if(is.numeric(object)) {
  mean(object)
}

# much better
if (is.numeric(object)) {
  mean(object)
}

18.3 Indentation

  • Keep your indentation style consistent

  • There is more than one way of indenting code

  • There is no “best” style that everyone should be following

  • You can indent using spaces or tabs (but don’t mix them)

  • Can help in detecting errors in your code because it can expose lack of symmetry

  • Do this systematically (RStudio editor helps a lot)

Don’t write code like this:

# no indentation
# Don't do this!
if(!is.vector(x)) {
stop('x must be a vector')
} else {
if(any(is.na(x))){
x <- x[!is.na(x)]
}
total <- length(x)
x_sum <- 0
for (i in seq_along(x)) {
  x_sum <- x_sum + x[i]
}
x_sum / total
}

Instead, write with indentation

# better with indentation
if (!is.vector(x)) {
  stop('x must be a vector')
} else {
  if (any(is.na(x))) {
    x <- x[!is.na(x)]
  }
  total <- length(x)
  x_sum <- 0
  for (i in seq_along(x)) {
    x_sum <- x_sum + x[i]
  }
  x_sum / total
}

There are several Indenting Styles

# style 1
find_roots <- function(a = 1, b = 1, c = 0) 
{
  if (b^2 - 4*a*c < 0) 
  {
    return("No real roots")
  } else 
  {
    return(quadratic(a = a, b = b, c = c))
  }
}

My preferred style is like this:

# style 2
find_roots <- function(a = 1, b = 1, c = 0) {
  if (b^2 - 4*a*c < 0) {
    return("No real roots")
  } else {
    return(quadratic(a = a, b = b, c = c))
  }
}

Benefits of code indentation:

  • Easier to read
  • Easier to understand
  • Easier to modify
  • Easier to maintain
  • Easier to enhance

18.3.1 Meaningful Names

Choose a consistent naming style for objects and functions

  • someObject (lowerCamelCase)
  • SomeObject (UpperCamelCase)
  • some_object (underscore separation)
  • some.object (dot separation)

Avoid using names of standard R objects, for example:

  • vector
  • mean
  • list
  • data
  • c
  • colors

If you’re thinking about using names of R objects, prefer something like this

  • xvector
  • xmean
  • xlist
  • xdata
  • xc
  • xcolors

Better to add meaning like this

  • mean_salary
  • input_vector
  • data_list
  • data_table
  • first_last
  • some_colors

Prefer Pronounceable Names

# avoid cryptic abbreviations
DtaRcrd102 <- list(
  nm = 'John Doe',
  bdg = 'Valley Life Sciences Building',
  rm = 2060
)


# prefer pronounceable names 
Customer <- list(
  name = 'John Doe',
  building = 'Valley Life Sciences Building',
  room = 2060
)

18.3.2 Syntax: Parentheses

Use parentheses for clarity even if not needed for order of operations.

a <- 2
x <- 3
y <- 4

a/y*x

# better
(a / y) * x

another example

# confusing
1:3^2
> [1] 1 2 3 4 5 6 7 8 9

# better
1:(3^2)
> [1] 1 2 3 4 5 6 7 8 9

18.4 Recommendations

  • Functions are tools and operations
  • Functions form the building blocks for larger tasks
  • Functions allow us to reuse blocks of code easily for later use
  • Use functions whenever possible
  • Try to write functions rather than carry out your work using blocks of code
  • Ideal length between 2 and 4 lines of code
  • No more than 10 lines
  • No more than 20 lines
  • Should not exceed the size of the text editor window
  • Don’t write long functions
  • Rewrite long functions by converting collections of related expression into separate functions
  • Smaller functions are easier to debug, easier to understand, and can be combined in a modular fashion
  • Functions shouldn’t be longer than one visible screen (with reasonable font)
  • Separate small functions
  • are easier to reason about and manage
  • are easier to test and verify they are correct
  • are more likely to be reusable
  • Think about different scenarios and contexts in which a function might be used
  • Can you generalize it?
  • Who will use it?
  • Who is going to maintain the code?
  • Use descriptive names
  • Readers (including you) should infer the operation by looking at the call of the function
  • be modular (having a single task)
  • have meaningful name
  • have a comment describing their purpose, inputs and outputs
  • Functions should not modify global variables
  • except connections or environments
  • should not change global par() settings