42 JSON Data
The goal of this chapter is to provide an introduction for handling JSON data in R.
We’ll cover the following topics:
- JSON Basics
- R packages for JSON data
- Reading JSON data from the Web
42.1 JSON Basics
JSON stands for JavaScript Object Notation and it is a format for representing data. More formally, we can say that it is a text-based way to store and transmit structured data. By using a simple syntax, you can easily store anything from a single number to strings, JSON-arrays, and JSON-objects using nothing but a string of plain text. As you will see, you can also nest arrays and objects, allowing you to create complex data structures.
Let’s first talk about what JSON is and why is important.
JSON is a data representation format very similar to XML. It’s used widely across the internet for almost every single API that you will access as well as for config files and things such as games and text editors. Its popularity is based on a handful of attractive aspects:
It’s extremely lightweight and compact to send back and forth due to the small size file;
It’s easy for both computers and people to read-and-write, compared to something like XML, since it’s much cleaner and there’s not as many opening and closing tags;
It maps very easily onto the data structures used by most programming languages (numbers, strings, booleans, nulls, arrays and associative arrays);
It also integrates very nicely with javascript since JSON is just a superset of javascript which means anything you write in JSON is valid javascript, which is a language used all throughout the web for front-end or back-end of applications.
Also, every single major language has some form of library or packages with built-in functionality to parse JSON strings into objects or classes in that language which makes working with JSON data extremely easy inside of a programming language.
Why should we care about JSON? When working with data from the Web, we’ll inevitably find some JSON data because it is commonly used in web applications to send data from the server to the browser. As a matter of fact, in your data science career you will be using JSON quite often, whether it is consuming an API, creating an API, or creating config files for you or other people to use for your application.
42.2 Understanding JSON Syntax
Let’s now talk about the syntax used in JSON.
42.2.1 Data Types
The first thing to talk about is the data types or values that JSON can represent. As we know, JSON is a data representation format, so we need to be able to represent certain data types within it. JSON supports the following types:
string
(in double quotes)number
(in any format whether they’re decimal numbers, integers, negative numbers, even numbers in scientific notation)true
andfalse
(booleans)null
42.2.2 Arrays
JSON also supports arrays (in JSON Sense) which are sets of data types
defined within brackets, and contains a comma-separated list of values.
For example [1, 3, 3]
or ["computing", "with", "data"]
,
which can be a set of any of the data types listed above.
We typically use arrays when we have a set of unnamed values, this is why some people refer to them as ordered unnamed arrays. The closest R object to a JSON array would be a vector:
JSON:
[1, 2, 3, ... ]
; -vs- R:c(1, 2, 3, ...)
JSON:
[true, true, false, ... ]
; -vs- R:c(TRUE, TRUE, FALSE, ...)
42.2.3 Objects
Another type of data container is the so-called JSON object, which is the most complex but also the most used type of object within JSON, and it allows you to represent values that are key-value pairs:
{"key": "value"}
You use curly braces to define a JSON-object, and inside the braces you put key-value pairs. The key must be surrounded by double quotes, followed by a colon, followed by the value. The value can be a single data type, but it can also be a JSON-array (which in turn can contain a JSON-object). Because you have the association of a key with its value, these JSON structures are also referred to as associative arrays.
For example, say the key is "year"
and the value 2000
, then a simple JSON
object will look like this:
{"year": 2000}
Another example can be a key "name"
and a value "Jessica"
:
{"name": "Jessica"}
If you have multiple key-value pairs, you separate each of them with a comma:
{
"name1": "Nicole",
"name2": "Pleuni",
"name3": "Rori"
}
A more complex object might look like the following example. In this case we
have JSON-object that contains three key-value pairs. Each of the keys is a
"person"
and the associated pair corresponds to an array which in turn
contains a JSON-object with two key-value pairs: the first name, and the
last name:
{
"person1": [
{
"first": "Nicole",
"last": "Adelstein"
}
],
"person2": [
{
"first": "Pleuni",
"last": "Pennings"
}
],
"person3": [
{
"first": "Rori",
"last": "Rohlfs"
}
]
}
Because the data inside a JSON object is formed of key-value pairs, you could think of them as named arrays.
What do JSON-objects correspond to in R? Well, there’s not really a unique correspondence between a JSON-object and its equivalent in structure R. For instance, let’s bring back one of the JSON-objects previously discussed:
{
"name1": "Nicole",
"name2": "Pleuni",
"name3": "Rori"
}
We could use a named R vector to store the same data:
But we could also use an R list:
Keep in mind that JSON-objects can be more complex than this basic example.
Because JSON objects can contain any other type of JSON data structure in them,
the similar container in R to a JSON-object is a list
.
42.2.4 Examples of JSON Data Containers
Here’s a series of examples involving combinations of JSON arrays and objects.
JSON containers can be nested. Here’s one example:
{
"name": ["X", "Y", "Z"],
"grams": [300, 200, 500],
"qty": [4, 5, null],
"new": [true, false, true]
}
Here’s another example of nested containers:
[
{ "name": "X",
"grams": 300,
"qty": 4,
"new": true },
{ "name": "Y",
"grams": 200,
"qty": 5,
"new": false },
{ "name": "Z",
"grams": 500,
"qty": null,
"new": true}
]
42.2.5 Data Table Toy Example
Let’s consider a less basic example with some tabular data set:
Name | Gender | Homeland | Born | Jedi |
---|---|---|---|---|
Anakin | male | Tatooine | 41.9BBY | yes |
Amidala | female | Naboo | 46BBY | no |
Luke | male | Tatooine | 19BBY | yes |
Leia | female | Alderaan | 19BBY | no |
Obi-Wan | male | Stewjon | 57BBY | yes |
Han | male | Corellia | 29BBY | no |
Palpatine | male | Naboo | 82BBY | no |
R2-D2 | unknown | Naboo | 33BBY | no |
How can we store this tabular data in JSON format? There are several ways to represent this data in JSON format. One option could be a JSON-array containing JSON-objects. Each JSON-object represents an individual:
[
{
"Name": "Anakin",
"Gender": "male",
"Homeworld": "Tatooine",
"Born": "41.9BBY",
"Jedi": "yes"
},
{
"Name": "Amidala",
"Gender": "female",
"Homeworld": "Naboo",
"Born": 46BBY",
"Jedi": "no"
},
...
{
"Name": "R2-D2",
"Gender": "unknown",
"Homeworld": "Naboo",
"Born": "33BBY",
"Jedi": "no"
}
]
Another way to represent the data in the table above is by using an object containing key-value pairs in which the keys are the names of the columns, and the pairs are arrays (the data values in each column).
{
"Name": [ "Anakin", "Amidala", "Luke", ... , "R2-D2" ],
"Gender": [ "male", "female", "male", ... , "unknown" ],
"Homeworld": [ "Tatooine", "Naboo", "Tatooine", ... , "Naboo" ],
"Born": [ "41.9BBY", "46BBY", "19BBY", ... , "33BBY" ],
"Jedi": [ "yes", "no", "yes", ... , "no" ]
}
42.3 JSON R packages
R has 3 packages for working with JSON data
"RJSONIO"
by Duncan Temple Lang"rjson"
by Alex Couture-Beil"jsonlite"
by Jeroen Ooms, Duncan Temple Lang, Jonathan Wallace
All packages provide 2 main functions, toJSON()
and fromJSON()
, that allow
conversion to and from data in JSON format, respectively.
We’ll focus on the functions from "jsonlite"
.
There are 2 primary functions in "jsonlite"
:
toJSON()
converts an R object to a string in JSONfromJSON()
converts JSON content to R objects
42.3.1 Function toJSON()
The function jsonlite::toJSON()
converts an R object to a string in JSON.
Example: single number to JSON-array
Let’s begin with a super simple example by passing a single data value to the
function toJSON()
:
Example: vectors to JSON-arrays
Consider the following vectors
Applying toJSON()
to the vectors num
and lts
produces JSON arrays:
The argument pretty = TRUE
allows you to obtain a JSON string with added
indentation whitespace:
toJSON(num, pretty = TRUE)
#> [1, 2, 3, 4, 5]
toJSON(lts, pretty = TRUE)
#> ["a", "b", "c", "d", "e"]
What about an R vector with named elements? For example, here’s a vector vec
Converting vec
to JSON, we get:
As you can tell, the names of the elements in vec
are lost in translation.
Example: matrix to JSON-array
Here’s another example from an matrix to a JSON array:
mat <- matrix(9:1, nrow = 3, ncol = 3)
mat
#> [,1] [,2] [,3]
#> [1,] 9 6 3
#> [2,] 8 5 2
#> [3,] 7 4 1
toJSON()
converts an R matrix into a JSON-array
Notice that the returned output arranges the values of the matrix row-by-row,
also referred to as row-major. This means that when the input is an R matrix,
toJSON()
uses its argument matrix = "rowmajor"
.
You can change the arrangement to column-major by specifying the argument
matrix = "columnmajor"
:
Example: data frame to JSON-object
We can also use toJSON()
on data frames. Here’s an example of an assembled
data frame swdf
which will be converted to a JSON-object:
# toy data
sw_data <- rbind(
c("Anakin", "male", "Tatooine", "41.9BBY", "yes"),
c("Amidala", "female", "Naboo", "46BBY", "no"),
c("Luke", "male", "Tatooine", "19BBY", "yes"),
c("Leia", "female", "Alderaan", "19BBY", "no")
)
# convert to data.frame and add column names
swdf <- data.frame(sw_data, stringsAsFactors = FALSE)
names(swdf) <- c("Name", "Gender", "Homeworld", "Born", "Jedi")
swdf
#> Name Gender Homeworld Born Jedi
#> 1 Anakin male Tatooine 41.9BBY yes
#> 2 Amidala female Naboo 46BBY no
#> 3 Luke male Tatooine 19BBY yes
#> 4 Leia female Alderaan 19BBY no
The default output when you pass a data frame to jsonlite::toJSON()
is
# convert R data.frame to JSON
sw_json = toJSON(swdf)
sw_json
#> [{"Name":"Anakin","Gender":"male","Homeworld":"Tatooine","Born":"41.9BBY","Jedi":"yes"},{"Name":"Amidala","Gender":"female","Homeworld":"Naboo","Born":"46BBY","Jedi":"no"},{"Name":"Luke","Gender":"male","Homeworld":"Tatooine","Born":"19BBY","Jedi":"yes"},{"Name":"Leia","Gender":"female","Homeworld":"Alderaan","Born":"19BBY","Jedi":"no"}]
The argument dataframe
gives you more control on the output. This argument
has three options:
"rows"
: each row is converted to a JSON-object with key-value pairs formed by"column_name": "row_value"
;
toJSON(swdf, dataframe = "rows")
#> [{"Name":"Anakin","Gender":"male","Homeworld":"Tatooine","Born":"41.9BBY","Jedi":"yes"},{"Name":"Amidala","Gender":"female","Homeworld":"Naboo","Born":"46BBY","Jedi":"no"},{"Name":"Luke","Gender":"male","Homeworld":"Tatooine","Born":"19BBY","Jedi":"yes"},{"Name":"Leia","Gender":"female","Homeworld":"Alderaan","Born":"19BBY","Jedi":"no"}]
"columns"
: each column is converted into a JSON-object with a single key for each column, and values stored as arrays;
toJSON(swdf, dataframe = "columns")
#> {"Name":["Anakin","Amidala","Luke","Leia"],"Gender":["male","female","male","female"],"Homeworld":["Tatooine","Naboo","Tatooine","Alderaan"],"Born":["41.9BBY","46BBY","19BBY","19BBY"],"Jedi":["yes","no","yes","no"]}
"values"
: the values in each column are converted to a JSON-array, and the names of the columns are lost.
42.3.2 Function fromJSON()
In practice, instead of converting R objects to JSON objects, it is more common to have data in JSON format which needs to be converted into an R object.
The function jsonlite::fromJSON()
converts a JSON-object to an R object.
Example: JSON-array to R vector
Example: JSON-object to R object
Consider a simple JSON-object, and its conversion to R with
jsonlite::fromJSON()
Notice that the obtained object is an R list in which the key becomes the name of the list, and the value becomes the content of the list’s element.
Consider a less simple JSON-object:
json_obj2 <- '{"name1": "Nicole", "name2": "Pleuni", "name3": "Rori"}'
fromJSON(json_obj2)
#> $name1
#> [1] "Nicole"
#>
#> $name2
#> [1] "Pleuni"
#>
#> $name3
#> [1] "Rori"
Another example:
Example: JSON-object to R object
Suppose you have a JSON object with the following data:
{
"Name": ["Anakin","Amidala","Luke","Leia"],
"Gender": ["male","female","male","female"],
"Homeworld": ["Tatooine","Naboo","Tatooine","Alderaan"],
"Born": ["41.9BBY","46BBY","19BBY","19BBY"],
"Jedi": ["yes","no","yes","no"]
}
and assume that the above data is stored as a single (continuous) string in
an R character vector json_sw
; applying fromJSON()
to this string gives
you the following list:
fromJSON(json_sw)
#> $Name
#> [1] "Anakin" "Amidala" "Luke" "Leia"
#>
#> $Gender
#> [1] "male" "female" "male" "female"
#>
#> $Homeworld
#> [1] "Tatooine" "Naboo" "Tatooine" "Alderaan"
#>
#> $Born
#> [1] "41.9BBY" "46BBY" "19BBY" "19BBY"
#>
#> $Jedi
#> [1] "yes" "no" "yes" "no"
Can this be transformed into a data frame? Yes, by passing the obtained list
to the function data.frame()
:
42.3.3 Reading JSON Data
Now that we have discussed the basics of JSON, and the common ways to convert
fromJSON()
and toJSON()
, let’s see how to read JSON data from the Web.
One of the typical ways to import JSON data from the Web to R is by passing
the url directly to fromJSON()
. Another way is by passing the name of the
file with the JSON content as a single string to the function fromJSON()
.
Here’s an example reading a JSON string from the website Advice Slip. The url https://api.adviceslip.com/advice gives you a random advice (see figure below):
As you can tell, the content is a simple JSON string
#> $slip
#> $slip$id
#> [1] 9
#>
#> $slip$advice
#> [1] "True happiness always resides in the quest!"
Example: Colors in Hexadecimal Notation
The following data comes from one of Dave Eddy’s github repositories:
https://raw.githubusercontent.com/bahamas10/css-color-names/master/css-color-names.json
This is a JSON-object in which the keys are color-names, and the values are the hexadecimal digits of the corresponding color:
{
"aliceblue": "#f0f8ff",
"antiquewhite": "#faebd7",
"aqua": "#00ffff",
"aquamarine": "#7fffd4",
"azure": "#f0ffff",
...
"wheat": "#f5deb3",
"white": "#ffffff",
"whitesmoke": "#f5f5f5",
"yellow": "#ffff00",
"yellowgreen": "#9acd32"
}
We pass the url to jsonlite::fromJSON()
colors_json <- "https://raw.githubusercontent.com/bahamas10/css-color-names/master/css-color-names.json"
hex_colors <- fromJSON(colors_json)
The output in hex_colors
is a list with 148 elements;
the first five elements are displayed below: