42 JSON Data

The goal of this chapter is to provide an introduction for handling JSON data in R.

We’ll cover the following topics:

  • JSON Basics
  • R packages for JSON data
  • Reading JSON data from the Web

42.1 JSON Basics

JSON stands for JavaScript Object Notation and it is a format for representing data. More formally, we can say that it is a text-based way to store and transmit structured data. By using a simple syntax, you can easily store anything from a single number to strings, JSON-arrays, and JSON-objects using nothing but a string of plain text. As you will see, you can also nest arrays and objects, allowing you to create complex data structures.

Let’s first talk about what JSON is and why is important.

JSON is a data representation format very similar to XML. It’s used widely across the internet for almost every single API that you will access as well as for config files and things such as games and text editors. Its popularity is based on a handful of attractive aspects:

  • It’s extremely lightweight and compact to send back and forth due to the small size file;

  • It’s easy for both computers and people to read-and-write, compared to something like XML, since it’s much cleaner and there’s not as many opening and closing tags;

  • It maps very easily onto the data structures used by most programming languages (numbers, strings, booleans, nulls, arrays and associative arrays);

  • It also integrates very nicely with javascript since JSON is just a superset of javascript which means anything you write in JSON is valid javascript, which is a language used all throughout the web for front-end or back-end of applications.

  • Also, every single major language has some form of library or packages with built-in functionality to parse JSON strings into objects or classes in that language which makes working with JSON data extremely easy inside of a programming language.

Why should we care about JSON? When working with data from the Web, we’ll inevitably find some JSON data because it is commonly used in web applications to send data from the server to the browser. As a matter of fact, in your data science career you will be using JSON quite often, whether it is consuming an API, creating an API, or creating config files for you or other people to use for your application.

42.2 Understanding JSON Syntax

Let’s now talk about the syntax used in JSON.

42.2.1 Data Types

The first thing to talk about is the data types or values that JSON can represent. As we know, JSON is a data representation format, so we need to be able to represent certain data types within it. JSON supports the following types:

  • string (in double quotes)

  • number (in any format whether they’re decimal numbers, integers, negative numbers, even numbers in scientific notation)

  • true and false (booleans)

  • null

42.2.2 Arrays

JSON also supports arrays (in JSON Sense) which are sets of data types defined within brackets, and contains a comma-separated list of values. For example [1, 3, 3] or ["computing", "with", "data"], which can be a set of any of the data types listed above.

We typically use arrays when we have a set of unnamed values, this is why some people refer to them as ordered unnamed arrays. The closest R object to a JSON array would be a vector:

  • JSON: [1, 2, 3, ... ]; -vs- R: c(1, 2, 3, ...)

  • JSON: [true, true, false, ... ]; -vs- R: c(TRUE, TRUE, FALSE, ...)

42.2.3 Objects

Another type of data container is the so-called JSON object, which is the most complex but also the most used type of object within JSON, and it allows you to represent values that are key-value pairs:

{"key": "value"}

You use curly braces to define a JSON-object, and inside the braces you put key-value pairs. The key must be surrounded by double quotes, followed by a colon, followed by the value. The value can be a single data type, but it can also be a JSON-array (which in turn can contain a JSON-object). Because you have the association of a key with its value, these JSON structures are also referred to as associative arrays.

For example, say the key is "year" and the value 2000, then a simple JSON object will look like this:

{"year": 2000}

Another example can be a key "name" and a value "Jessica":

{"name": "Jessica"}

If you have multiple key-value pairs, you separate each of them with a comma:

{
  "name1": "Nicole",
  "name2": "Pleuni",
  "name3": "Rori"
}

A more complex object might look like the following example. In this case we have JSON-object that contains three key-value pairs. Each of the keys is a "person" and the associated pair corresponds to an array which in turn contains a JSON-object with two key-value pairs: the first name, and the last name:

{
  "person1": [
    {
      "first": "Nicole",
      "last": "Adelstein"
    }
  ],
  "person2": [
    {
      "first": "Pleuni",
      "last": "Pennings"
    }
  ],
  "person3": [
    {
      "first": "Rori",
      "last": "Rohlfs"
    }
  ]
}

Because the data inside a JSON object is formed of key-value pairs, you could think of them as named arrays.

What do JSON-objects correspond to in R? Well, there’s not really a unique correspondence between a JSON-object and its equivalent in structure R. For instance, let’s bring back one of the JSON-objects previously discussed:

{
  "name1": "Nicole",
  "name2": "Pleuni",
  "name3": "Rori"
}

We could use a named R vector to store the same data:

# named vector in R
c("name1" = "Nicole", "name2" = "Pleuni", "name3" = "Rori")

But we could also use an R list:

# named list in R
list("name1" = "Nicole", "name2" = "Pleuni", "name3" = "Rori")

Keep in mind that JSON-objects can be more complex than this basic example. Because JSON objects can contain any other type of JSON data structure in them, the similar container in R to a JSON-object is a list.

42.2.4 Examples of JSON Data Containers

Here’s a series of examples involving combinations of JSON arrays and objects.

JSON containers can be nested. Here’s one example:

{
    "name": ["X", "Y", "Z"],
    "grams": [300, 200, 500], 
    "qty": [4, 5, null],
    "new": [true, false, true]
}

Here’s another example of nested containers:

[
    { "name": "X", 
      "grams": 300,
      "qty": 4,
      "new": true },
    { "name": "Y",
      "grams": 200,
      "qty": 5,
      "new": false },
    { "name": "Z",
      "grams": 500, 
      "qty": null,
      "new": true}
]

42.2.5 Data Table Toy Example

Let’s consider a less basic example with some tabular data set:

Name Gender Homeland Born Jedi
Anakin male Tatooine 41.9BBY yes
Amidala female Naboo 46BBY no
Luke male Tatooine 19BBY yes
Leia female Alderaan 19BBY no
Obi-Wan male Stewjon 57BBY yes
Han male Corellia 29BBY no
Palpatine male Naboo 82BBY no
R2-D2 unknown Naboo 33BBY no

How can we store this tabular data in JSON format? There are several ways to represent this data in JSON format. One option could be a JSON-array containing JSON-objects. Each JSON-object represents an individual:

    [
        {
         "Name": "Anakin",
         "Gender": "male", 
         "Homeworld": "Tatooine",
         "Born": "41.9BBY",
         "Jedi": "yes"
        },
        {
         "Name": "Amidala",
         "Gender": "female", 
         "Homeworld": "Naboo",
         "Born": 46BBY",
         "Jedi": "no"
        },
        ...
        {
         "Name": "R2-D2",
         "Gender": "unknown",
         "Homeworld": "Naboo",
         "Born": "33BBY",
         "Jedi": "no"
        }
    ]

Another way to represent the data in the table above is by using an object containing key-value pairs in which the keys are the names of the columns, and the pairs are arrays (the data values in each column).

{
  "Name": [ "Anakin", "Amidala", "Luke", ... , "R2-D2" ],
  "Gender": [ "male", "female", "male", ... , "unknown" ],
  "Homeworld": [ "Tatooine", "Naboo", "Tatooine", ... , "Naboo" ],
  "Born": [ "41.9BBY", "46BBY", "19BBY", ... , "33BBY" ],
  "Jedi": [ "yes", "no", "yes", ... , "no" ] 
}

42.3 JSON R packages

R has 3 packages for working with JSON data

  • "RJSONIO" by Duncan Temple Lang

  • "rjson" by Alex Couture-Beil

  • "jsonlite" by Jeroen Ooms, Duncan Temple Lang, Jonathan Wallace

All packages provide 2 main functions, toJSON() and fromJSON(), that allow conversion to and from data in JSON format, respectively. We’ll focus on the functions from "jsonlite".

There are 2 primary functions in "jsonlite":

  • toJSON() converts an R object to a string in JSON

  • fromJSON() converts JSON content to R objects

42.3.1 Function toJSON()

The function jsonlite::toJSON() converts an R object to a string in JSON.

Example: single number to JSON-array

Let’s begin with a super simple example by passing a single data value to the function toJSON():

toJSON(pi, digits = 4)
#> [3.1416]

Example: vectors to JSON-arrays

Consider the following vectors

num <- c(1, 2, 3, 4, 5)
lts <- c('a', 'b', 'c', 'd', 'e')

Applying toJSON() to the vectors num and lts produces JSON arrays:

toJSON(num)
#> [1,2,3,4,5]

toJSON(lts)
#> ["a","b","c","d","e"]

The argument pretty = TRUE allows you to obtain a JSON string with added indentation whitespace:

toJSON(num, pretty = TRUE)
#> [1, 2, 3, 4, 5]

toJSON(lts, pretty = TRUE)
#> ["a", "b", "c", "d", "e"]

What about an R vector with named elements? For example, here’s a vector vec

vec <- num
names(vec) <- lts
vec
#> a b c d e 
#> 1 2 3 4 5

Converting vec to JSON, we get:

toJSON(vec)
#> [1,2,3,4,5]

As you can tell, the names of the elements in vec are lost in translation.

Example: matrix to JSON-array

Here’s another example from an matrix to a JSON array:

mat <- matrix(9:1, nrow = 3, ncol = 3)
mat
#>      [,1] [,2] [,3]
#> [1,]    9    6    3
#> [2,]    8    5    2
#> [3,]    7    4    1

toJSON() converts an R matrix into a JSON-array

toJSON(mat)
#> [[9,6,3],[8,5,2],[7,4,1]]

Notice that the returned output arranges the values of the matrix row-by-row, also referred to as row-major. This means that when the input is an R matrix, toJSON() uses its argument matrix = "rowmajor".

You can change the arrangement to column-major by specifying the argument matrix = "columnmajor":

toJSON(mat, matrix = "columnmajor")
#> [[9,8,7],[6,5,4],[3,2,1]]

Example: data frame to JSON-object

We can also use toJSON() on data frames. Here’s an example of an assembled data frame swdf which will be converted to a JSON-object:

# toy data
sw_data <- rbind(
  c("Anakin", "male", "Tatooine", "41.9BBY",  "yes"),  
  c("Amidala", "female", "Naboo", "46BBY", "no"),
  c("Luke", "male", "Tatooine", "19BBY", "yes"),
  c("Leia", "female", "Alderaan", "19BBY", "no")
)

# convert to data.frame and add column names
swdf <- data.frame(sw_data, stringsAsFactors = FALSE)
names(swdf) <- c("Name", "Gender", "Homeworld", "Born", "Jedi")
swdf
#>      Name Gender Homeworld    Born Jedi
#> 1  Anakin   male  Tatooine 41.9BBY  yes
#> 2 Amidala female     Naboo   46BBY   no
#> 3    Luke   male  Tatooine   19BBY  yes
#> 4    Leia female  Alderaan   19BBY   no

The default output when you pass a data frame to jsonlite::toJSON() is

# convert R data.frame to JSON
sw_json = toJSON(swdf)
sw_json
#> [{"Name":"Anakin","Gender":"male","Homeworld":"Tatooine","Born":"41.9BBY","Jedi":"yes"},{"Name":"Amidala","Gender":"female","Homeworld":"Naboo","Born":"46BBY","Jedi":"no"},{"Name":"Luke","Gender":"male","Homeworld":"Tatooine","Born":"19BBY","Jedi":"yes"},{"Name":"Leia","Gender":"female","Homeworld":"Alderaan","Born":"19BBY","Jedi":"no"}]

The argument dataframe gives you more control on the output. This argument has three options:

  • "rows": each row is converted to a JSON-object with key-value pairs formed by "column_name": "row_value";
toJSON(swdf, dataframe = "rows")
#> [{"Name":"Anakin","Gender":"male","Homeworld":"Tatooine","Born":"41.9BBY","Jedi":"yes"},{"Name":"Amidala","Gender":"female","Homeworld":"Naboo","Born":"46BBY","Jedi":"no"},{"Name":"Luke","Gender":"male","Homeworld":"Tatooine","Born":"19BBY","Jedi":"yes"},{"Name":"Leia","Gender":"female","Homeworld":"Alderaan","Born":"19BBY","Jedi":"no"}]
  • "columns": each column is converted into a JSON-object with a single key for each column, and values stored as arrays;
toJSON(swdf, dataframe = "columns")
#> {"Name":["Anakin","Amidala","Luke","Leia"],"Gender":["male","female","male","female"],"Homeworld":["Tatooine","Naboo","Tatooine","Alderaan"],"Born":["41.9BBY","46BBY","19BBY","19BBY"],"Jedi":["yes","no","yes","no"]}
  • "values": the values in each column are converted to a JSON-array, and the names of the columns are lost.
toJSON(swdf, dataframe = "values")
#> [["Anakin","male","Tatooine","41.9BBY","yes"],["Amidala","female","Naboo","46BBY","no"],["Luke","male","Tatooine","19BBY","yes"],["Leia","female","Alderaan","19BBY","no"]]

42.3.2 Function fromJSON()

In practice, instead of converting R objects to JSON objects, it is more common to have data in JSON format which needs to be converted into an R object.

The function jsonlite::fromJSON() converts a JSON-object to an R object.

Example: JSON-array to R vector

json_array <- '["computing", "with", "data"]'

fromJSON(json_array)
#> [1] "computing" "with"      "data"

Example: JSON-object to R object

Consider a simple JSON-object, and its conversion to R with jsonlite::fromJSON()

json_obj1 <- '{"name": "Jessica"}'

fromJSON(json_obj1)
#> $name
#> [1] "Jessica"

Notice that the obtained object is an R list in which the key becomes the name of the list, and the value becomes the content of the list’s element.

Consider a less simple JSON-object:

json_obj2 <- '{"name1": "Nicole", "name2": "Pleuni", "name3": "Rori"}'

fromJSON(json_obj2)
#> $name1
#> [1] "Nicole"
#> 
#> $name2
#> [1] "Pleuni"
#> 
#> $name3
#> [1] "Rori"

Another example:

fromJSON('{"name": ["X", "Y"], "grams": [30, 20], "qty": [4, null],
"new": [true, false]}')
#> $name
#> [1] "X" "Y"
#> 
#> $grams
#> [1] 30 20
#> 
#> $qty
#> [1]  4 NA
#> 
#> $new
#> [1]  TRUE FALSE

Example: JSON-object to R object

Suppose you have a JSON object with the following data:

{
  "Name": ["Anakin","Amidala","Luke","Leia"],
  "Gender": ["male","female","male","female"],
  "Homeworld": ["Tatooine","Naboo","Tatooine","Alderaan"],
  "Born": ["41.9BBY","46BBY","19BBY","19BBY"],
  "Jedi": ["yes","no","yes","no"]
}

and assume that the above data is stored as a single (continuous) string in an R character vector json_sw; applying fromJSON() to this string gives you the following list:

fromJSON(json_sw)
#> $Name
#> [1] "Anakin"  "Amidala" "Luke"    "Leia"   
#> 
#> $Gender
#> [1] "male"   "female" "male"   "female"
#> 
#> $Homeworld
#> [1] "Tatooine" "Naboo"    "Tatooine" "Alderaan"
#> 
#> $Born
#> [1] "41.9BBY" "46BBY"   "19BBY"   "19BBY"  
#> 
#> $Jedi
#> [1] "yes" "no"  "yes" "no"

Can this be transformed into a data frame? Yes, by passing the obtained list to the function data.frame():

data.frame(fromJSON(json_sw))
#>      Name Gender Homeworld    Born Jedi
#> 1  Anakin   male  Tatooine 41.9BBY  yes
#> 2 Amidala female     Naboo   46BBY   no
#> 3    Luke   male  Tatooine   19BBY  yes
#> 4    Leia female  Alderaan   19BBY   no

42.3.3 Reading JSON Data

Now that we have discussed the basics of JSON, and the common ways to convert fromJSON() and toJSON(), let’s see how to read JSON data from the Web.

One of the typical ways to import JSON data from the Web to R is by passing the url directly to fromJSON(). Another way is by passing the name of the file with the JSON content as a single string to the function fromJSON().

Here’s an example reading a JSON string from the website Advice Slip. The url https://api.adviceslip.com/advice gives you a random advice (see figure below):

Random advice from Advice Slip

Figure 42.1: Random advice from Advice Slip

As you can tell, the content is a simple JSON string

advice_url <- "https://api.adviceslip.com/advice"

fromJSON(advice_url)
#> $slip
#> $slip$id
#> [1] 9
#> 
#> $slip$advice
#> [1] "True happiness always resides in the quest!"

Example: Colors in Hexadecimal Notation

The following data comes from one of Dave Eddy’s github repositories:

https://raw.githubusercontent.com/bahamas10/css-color-names/master/css-color-names.json

This is a JSON-object in which the keys are color-names, and the values are the hexadecimal digits of the corresponding color:

{
  "aliceblue": "#f0f8ff",
  "antiquewhite": "#faebd7",
  "aqua": "#00ffff",
  "aquamarine": "#7fffd4",
  "azure": "#f0ffff",
  ...
  "wheat": "#f5deb3",
  "white": "#ffffff",
  "whitesmoke": "#f5f5f5",
  "yellow": "#ffff00",
  "yellowgreen": "#9acd32"
}

We pass the url to jsonlite::fromJSON()

colors_json <- "https://raw.githubusercontent.com/bahamas10/css-color-names/master/css-color-names.json"

hex_colors <- fromJSON(colors_json)

The output in hex_colors is a list with 148 elements; the first five elements are displayed below:

hex_colors[1:5]
#> $aliceblue
#> [1] "#f0f8ff"
#> 
#> $antiquewhite
#> [1] "#faebd7"
#> 
#> $aqua
#> [1] "#00ffff"
#> 
#> $aquamarine
#> [1] "#7fffd4"
#> 
#> $azure
#> [1] "#f0ffff"