27 Understanding Functions

Let’s consider another toy example with a function that squares its argument:

square <- function(x) {
  x * x
}

the function name is "square"
it has one argument: x
the function body consists of one simple expression
it returns the value x * x

square() works like any other function in R:

square(10)
#> [1] 100

In this case, square() is also vectorized:

square(1:5)
#> [1]  1  4  9 16 25

Why is square() vectorized?

Once defined, functions can be used in other function definitions:

sum_of_squares <- function(x) {
  sum(square(x))
}
sum_of_squares(1:5)
#> [1] 55

27.0.1 Simple Expressions

Functions with a body consisting of a simple expression can be written with no braces (in one single line!):

square <- function(x) x * x
square(10)
#> [1] 100

However, as a general coding rule, you should get into the habit of writing functions using braces.

27.0.2 Nested Functions

We can also define a function inside another function:

getmax <- function(a) {
  # nested function
  maxpos <- function(u) which.max(u) 
  # output
  list(position = maxpos(a),
       value = max(a))
}
getmax(c(2, -4, 6, 10, pi))
#> $position
#> [1] 4
#> 
#> $value
#> [1] 10

27.1 Function Output

The value of a function can be established in two ways:

As the last evaluated simple expression (in the body of the function)
An explicitly returned value via return()

Here’s a basic example of a function in which the output is the last evaluated expression:

add <- function(x, y) {
  x + y
}
add(2, 3)
#> [1] 5

Here’s another version of add() in which the output is the last evaluated expression:

add <- function(x, y) {
  z <- x + y
  z
}
add(2, 3)
#> [1] 5

Be careful with the form in which the last expression is evaluated:

add <- function(x, y) {
  z <- x + y
}
add(2, 3)

In this case, it looks like add() does not work. If you run the previous code, nothing appears in the console. Can you guess why? To help you answer this question, assign the invocation to an object and then print the object:

why <- add(2, 3)
why

add() does work. The issue has to do with the form of the last expression. Nothing gets displayed in the console because the last statement z <- x + y is an assignment (that does not print anything).

27.1.1 The `return()` command

More often than not, the return() command is included to explicitly indicate the output of a function:

add <- function(x, y) {
  z <- x + y
  return(z)
}
add(2, 3)
#> [1] 5

I’ve seen that many users with previous programming experience in other languages prefer to use return(). The main reason is that most programming languages tend to use some sort of return statement to indicate the output of a function.

So, following good language-agnostic coding practices, we also recommend that you use the function return(). In this way, any reader can quickly scan the body of your functions and visually locate the places in which a return statement is being made.

27.1.2 Variance Function Example

The sample variance is given by the following formula:

\[ var(x) = \frac{1}{n-1} \sum_{i = 1}^{n} (x_i - \bar{x})^2 \]

Let’s create a variance() function that computes the sample variance. The first step should always be writing the code that will become the body of the function:

# start simple
x <- 1:10
# get working code
sum((x - mean(x)) ^ 2) / (length(x) - 1)
#> [1] 9.17
# test it: compare it to var()
var(1:10)
#> [1] 9.17

One you know your code works, then you can encapsulate with function():

# encapsulate your code
variance <- function(x) {
  sum((x - mean(x)) ^ 2) / (length(x) - 1)
}
# check that it works
variance(x)
#> [1] 9.17

Before doing any further changes to variance(), you should test it with a handful of other (possibly extreme) cases:

# consider less simple cases
variance(runif(10))
#> [1] 0.0341
# what about atypical cases?
variance(rep(0, 10))
#> [1] 0
# what if there are missing values?
variance(c(1:9, NA))
#> [1] NA

You can then start gradually adapting your function to make it more robust, more flexible, more user friendly, etc. For instance, variance() returns NA when the provided vector contains missing values. But you can include an argument that removes any missing values. Many functions in R have this feature, like sum(), mean(), median(). They all use the so-called na.rm argument to specify if missing values should be removed before any computation is done:

# adapt it gradually
variance <- function(x, na.rm = FALSE) {
  if (na.rm) {
    # removing missing values
    x <- x[!is.na(x)]
  }
  # compute sample variance
  sum((x - mean(x)) ^ 2) / (length(x) - 1)
}
# check that it works
variance(c(1:9, NA), na.rm = TRUE)
#> [1] 7.5

27.2 Documenting Functions

The examples of functions in this chapter are simple, and fairly understandble (I hope so). However, you should strive to always include documentation for your functions. What does this mean? Documenting a function involves adding descriptions for what the purpose of a function is, the inputs it accepts, and the output it produces.

Description: what the function does
Input(s): what are the inputs or arguments
Output: what is the output (returned value)

You can find some inspiration in the help() documentation when your search for a given function’s description.

There are several approaches for writing documentation of a function. I will show you how to use what are called roxygen comments to achieve this task. While not used by most useRs, they are great when you want to take your code and make a package out of it.

Here’s an example of documentation for standardize() using roxygen comments:

#' @title Standardize
#' @description Transforms values in standard units (i.e. standard scores)
#' @param x numeric vector
#' @param na.rm whether to remove missing values
#' @return standardized values
#' @examples
#'   standardize(rnorm(10))
standardize <- function(x, na.rm = FALSE) {
  z <- (x - mean(x, na.rm = na.rm)) / sd(x, na.rm = na.rm)
  return(z)
}

Roxygen comments are R comments formed by the hash symbol immediately followed by an apostrophe: #'
You specify the label of a field with @ and a keyword: e.g. @title
The syntax highlighting of RStudio recognizes this type of comments and labels
Typical roxygen fields are:

label	meaning	description
`@title`	title	name of your function
`@description`	description	what the function does
`@param input`	parameter	describe `input` parameter
`@return`	output	what is the returned value

27.2.1 General Strategy for Writing Functions

Always start small, with test toy-values.
Get what will be the body of the function working first.
Check out each step of the way.
Don’t try and do too much at once.
Create (encapsulate body) the function once everything works.
Include documentation; we suggest using Roxygen comments.
Optional: after you have a function that works, then you may worry about “elegance”, “efficiency”, “cleverness”, etc.
For beginners, it’s better to have an “ugly/inefficient” function that does the work, rather than waisting a lot time, effort, and energy to get a “smart” function.
The more you practice, the easier will be to create functions.
As you get more experience, making more clever and elegant functions will be less difficult, and worth your time.

27.3 Naming Functions

There are different ways to name functions. The following list provides some examples with different naming styles:

squareroot()
SquareRoot()
squareRoot()
square.root()
square_root()

I personally use the underscore style. But you may find other programmers employing a different naming format. We strongly suggest using a consistent naming style. Many programming teams define their own style guides. If you are new to programming, it usually takes time to develop a consistent style. However, the sooner you have a defined personal style, the better.

It is also important that you know which names are invalid in R:

5quareroot(): cannot begin with a number
_square(): cannot begin with an underscore
square-root(): cannot use hyphenated names

In addition, avoid using an already existing name, e.g. sqrt().

Sometimes you will find functions with names starting with a dot: .hidden(); this type of functions are hidden functions, meaning that the function won’t be visible by default in the list of objects in your working environment.

ls()
#>   [1] "a"                    "A"                    "a1"                  
#>   [4] "a2"                   "a3"                   "add"                 
#>   [7] "amy75"                "arr"                  "avg_height_by_gender"
#>  [10] "avg_ht_female"        "avg_ht_male"          "avg_wind_pressure_75"
#>  [13] "b"                    "B"                    "b1"                  
#>  [16] "b2"                   "b3"                   "butter"              
#>  [19] "cal"                  "cm2in"                "coffee_prices"       
#>  [22] "colors1"              "colors2"              "colors3"             
#>  [25] "crazy"                "dat"                  "day"                 
#>  [28] "day_num"              "distances"            "empty_chr"           
#>  [31] "empty_str"            "evalue"               "ex1"                 
#>  [34] "ex2"                  "fem_male_height"      "first"               
#>  [37] "first_factor"         "gender_height"        "get_dist"            
#>  [40] "getmax"               "gg_world"             "gg_world2"           
#>  [43] "h"                    "height_by_gender"     "height_females"      
#>  [46] "height_males"         "hp"                   "ht10"                
#>  [49] "i"                    "IloveR"               "j"                   
#>  [52] "jelly"                "last"                 "letrs"               
#>  [55] "lets"                 "lis"                  "log_vector"          
#>  [58] "lst"                  "mat"                  "men_col"             
#>  [61] "men_dat"              "men_html"             "meters"              
#>  [64] "millions"             "mixed"                "num_day"             
#>  [67] "num_letters"          "num_vector"           "numbers"             
#>  [70] "one"                  "oski"                 "peanut"              
#>  [73] "phone"                "Phone"                "PHONE"               
#>  [76] "PI"                   "pie"                  "player"              
#>  [79] "player1"              "points1"              "position"            
#>  [82] "ppg"                  "prices"               "reordered_names"     
#>  [85] "rookie"               "rookie1"              "salary"              
#>  [88] "sandwich"             "second_factor"        "sizes"               
#>  [91] "some_colors"          "some_name"            "square"              
#>  [94] "standardize"          "states"               "stats"               
#>  [97] "storms_75_80"         "storms_per_year"      "storms_year_name"    
#> [100] "storms75"             "str_vector"           "string"              
#> [103] "strings"              "student"              "sum_of_squares"      
#> [106] "sw"                   "tbl"                  "temp_convert"        
#> [109] "text6"                "third_factor"         "times"               
#> [112] "val_rep"              "val_while"            "value"               
#> [115] "values"               "variance"             "vec"                 
#> [118] "vec1"                 "vec2"                 "vec3"                
#> [121] "weight"               "which_females"        "which_males"         
#> [124] "women_col"            "women_dat"            "women_html"          
#> [127] "world_df"             "world_map"            "x"                   
#> [130] "X"                    "x_devs"               "x_mean"              
#> [133] "x_sd"                 "y"                    "years"               
#> [136] "yummy"                "z"                    "zee"                 
#> [139] "zzz"
visible <- function(x) {
  x * 2
}

.hidden <- function(y) {
  y * 2
}
ls()
#>   [1] "a"                    "A"                    "a1"                  
#>   [4] "a2"                   "a3"                   "add"                 
#>   [7] "amy75"                "arr"                  "avg_height_by_gender"
#>  [10] "avg_ht_female"        "avg_ht_male"          "avg_wind_pressure_75"
#>  [13] "b"                    "B"                    "b1"                  
#>  [16] "b2"                   "b3"                   "butter"              
#>  [19] "cal"                  "cm2in"                "coffee_prices"       
#>  [22] "colors1"              "colors2"              "colors3"             
#>  [25] "crazy"                "dat"                  "day"                 
#>  [28] "day_num"              "distances"            "empty_chr"           
#>  [31] "empty_str"            "evalue"               "ex1"                 
#>  [34] "ex2"                  "fem_male_height"      "first"               
#>  [37] "first_factor"         "gender_height"        "get_dist"            
#>  [40] "getmax"               "gg_world"             "gg_world2"           
#>  [43] "h"                    "height_by_gender"     "height_females"      
#>  [46] "height_males"         "hp"                   "ht10"                
#>  [49] "i"                    "IloveR"               "j"                   
#>  [52] "jelly"                "last"                 "letrs"               
#>  [55] "lets"                 "lis"                  "log_vector"          
#>  [58] "lst"                  "mat"                  "men_col"             
#>  [61] "men_dat"              "men_html"             "meters"              
#>  [64] "millions"             "mixed"                "num_day"             
#>  [67] "num_letters"          "num_vector"           "numbers"             
#>  [70] "one"                  "oski"                 "peanut"              
#>  [73] "phone"                "Phone"                "PHONE"               
#>  [76] "PI"                   "pie"                  "player"              
#>  [79] "player1"              "points1"              "position"            
#>  [82] "ppg"                  "prices"               "reordered_names"     
#>  [85] "rookie"               "rookie1"              "salary"              
#>  [88] "sandwich"             "second_factor"        "sizes"               
#>  [91] "some_colors"          "some_name"            "square"              
#>  [94] "standardize"          "states"               "stats"               
#>  [97] "storms_75_80"         "storms_per_year"      "storms_year_name"    
#> [100] "storms75"             "str_vector"           "string"              
#> [103] "strings"              "student"              "sum_of_squares"      
#> [106] "sw"                   "tbl"                  "temp_convert"        
#> [109] "text6"                "third_factor"         "times"               
#> [112] "val_rep"              "val_while"            "value"               
#> [115] "values"               "variance"             "vec"                 
#> [118] "vec1"                 "vec2"                 "vec3"                
#> [121] "visible"              "weight"               "which_females"       
#> [124] "which_males"          "women_col"            "women_dat"           
#> [127] "women_html"           "world_df"             "world_map"           
#> [130] "x"                    "X"                    "x_devs"              
#> [133] "x_mean"               "x_sd"                 "y"                   
#> [136] "years"                "yummy"                "z"                   
#> [139] "zee"                  "zzz"

ls(all.names = TRUE)
#>   [1] ".hidden"              ".Random.seed"         "a"                   
#>   [4] "A"                    "a1"                   "a2"                  
#>   [7] "a3"                   "add"                  "amy75"               
#>  [10] "arr"                  "avg_height_by_gender" "avg_ht_female"       
#>  [13] "avg_ht_male"          "avg_wind_pressure_75" "b"                   
#>  [16] "B"                    "b1"                   "b2"                  
#>  [19] "b3"                   "butter"               "cal"                 
#>  [22] "cm2in"                "coffee_prices"        "colors1"             
#>  [25] "colors2"              "colors3"              "crazy"               
#>  [28] "dat"                  "day"                  "day_num"             
#>  [31] "distances"            "empty_chr"            "empty_str"           
#>  [34] "evalue"               "ex1"                  "ex2"                 
#>  [37] "fem_male_height"      "first"                "first_factor"        
#>  [40] "gender_height"        "get_dist"             "getmax"              
#>  [43] "gg_world"             "gg_world2"            "h"                   
#>  [46] "height_by_gender"     "height_females"       "height_males"        
#>  [49] "hp"                   "ht10"                 "i"                   
#>  [52] "IloveR"               "j"                    "jelly"               
#>  [55] "last"                 "letrs"                "lets"                
#>  [58] "lis"                  "log_vector"           "lst"                 
#>  [61] "mat"                  "men_col"              "men_dat"             
#>  [64] "men_html"             "meters"               "millions"            
#>  [67] "mixed"                "num_day"              "num_letters"         
#>  [70] "num_vector"           "numbers"              "one"                 
#>  [73] "oski"                 "peanut"               "phone"               
#>  [76] "Phone"                "PHONE"                "PI"                  
#>  [79] "pie"                  "player"               "player1"             
#>  [82] "points1"              "position"             "ppg"                 
#>  [85] "prices"               "reordered_names"      "rookie"              
#>  [88] "rookie1"              "salary"               "sandwich"            
#>  [91] "second_factor"        "sizes"                "some_colors"         
#>  [94] "some_name"            "square"               "standardize"         
#>  [97] "states"               "stats"                "storms_75_80"        
#> [100] "storms_per_year"      "storms_year_name"     "storms75"            
#> [103] "str_vector"           "string"               "strings"             
#> [106] "student"              "sum_of_squares"       "sw"                  
#> [109] "tbl"                  "temp_convert"         "text6"               
#> [112] "third_factor"         "times"                "val_rep"             
#> [115] "val_while"            "value"                "values"              
#> [118] "variance"             "vec"                  "vec1"                
#> [121] "vec2"                 "vec3"                 "visible"             
#> [124] "weight"               "which_females"        "which_males"         
#> [127] "women_col"            "women_dat"            "women_html"          
#> [130] "world_df"             "world_map"            "x"                   
#> [133] "X"                    "x_devs"               "x_mean"              
#> [136] "x_sd"                 "y"                    "years"               
#> [139] "yummy"                "z"                    "zee"                 
#> [142] "zzz"

27.4 Recommendations

Functions are tools and operations
Functions form the building blocks for larger tasks
Functions allow us to reuse blocks of code easily for later use
Use functions whenever possible
Try to write functions rather than carry out your work using blocks of code
Ideal length between 2 and 4 lines of code
No more than 10 lines
No more than 20 lines
Should not exceed the size of the text editor window
Don’t write long functions
Rewrite long functions by converting collections of related expression into separate functions
Smaller functions are easier to debug, easier to understand, and can be combined in a modular fashion
Functions shouldn’t be longer than one visible screen (with reasonable font)
Separate small functions
are easier to reason about and manage
are easier to test and verify they are correct
are more likely to be reusable
Think about different scenarios and contexts in which a function might be used
Can you generalize it?
Who will use it?
Who is going to maintain the code?
Use descriptive names
Readers (including you) should infer the operation by looking at the call of the function
be modular (having a single task)
have meaningful name
have a comment describing their purpose, inputs and outputs
Functions should not modify global variables
except connections or environments
should not change global par() settings