8 JSON Data

The goal of this chapter is to provide an introduction for handling JSON data in R.

We’ll cover the following topics:

  • JSON Basics
  • R packages for JSON data
  • Reading JSON data from the Web

8.1 JSON Basics

JSON stands for JavaScript Object Notation and it is a format for representing data. More formally, we can say that it is a text-based way to store and transmit structured data. By using a simple syntax, you can easily store anything from a single number to strings, JSON-arrays, and JSON-objects using nothing but a string of plain text. As you will see, you can also nest arrays and objects, allowing you to create complex data structures.

8.2 What is JSON?

Let’s first talk about what JSON is and why it is important.

JSON is a data representation format very similar to XML. It’s used widely across the internet for almost every single API that you will access as well as for config files and things such as games and text editors. Its popularity is based on a handful of attractive aspects:

  • It’s extremely lightweight and compact to send back and forth due to the small size file;

  • It’s easy for both computers and people to read-and-write, compared to something like XML, since it’s much cleaner and there’s not as many opening and closing tags;

  • It maps very easily onto the data structures used by most programming languages (numbers, strings, booleans, nulls, arrays and associative arrays);

  • It also integrates very nicely with javascript since JSON is just a superset of javascript which means anything you write in JSON is valid javascript, which is a language used all throughout the web for front-end or back-end of applications.

  • Also, every single major language has some form of library or packages with built-in functionality to parse JSON strings into objects or classes in that language which makes working with JSON data extremely easy inside of a programming language.

Why should we care about JSON? When working with data from the Web, we’ll inevitably find some JSON data because it is commonly used in web applications to send data from the server to the browser. As a matter of fact, in your data science career you will be using JSON quite often, whether it is consuming an API, creating an API, or creating config files for you or other people to use for your application.

8.3 Understanding JSON Syntax

Let’s now talk about the syntax used to store and organize data in JSON.

8.3.1 Data Types

The first thing to talk about is the data types or values that JSON can represent. As we know, JSON is a data representation format, so we need to be able to represent certain data types within it. JSON supports the following types:

  • string (in double quotes)

  • number (in any format whether they’re decimal numbers, integers, negative numbers, even numbers in scientific notation)

  • true and false (booleans)

  • null

8.3.2 Arrays

JSON also supports arrays (in JSON Sense) which are sets of data types defined within brackets, and contains a comma-separated list of values. For example [1, 3, 3] or ["computing", "with", "data"], which can be a set of any of the data types listed above.

We typically use arrays when we have a set of unnamed values, this is why some people refer to them as ordered unnamed arrays. The closest R object to a JSON array would be a vector:

  • JSON: [1, 2, 3, ... ]; -vs- R: c(1, 2, 3, ...)

  • JSON: [true, true, false, ... ]; -vs- R: c(TRUE, TRUE, FALSE, ...)

8.3.3 Objects

Another type of data container is the so-called JSON object, which is the most complex but also the most used type of object within JSON, and it allows you to represent values that are key-value pairs:

{"key": "value"}

You use curly braces to define a JSON-object, and inside the braces you put key-value pairs. The key must be surrounded by double quotes, followed by a colon, followed by the value. The value can be a single data type, but it can also be a JSON-array (which in turn can contain a JSON-object). Because you have the association of a key with its value, these JSON structures are also referred to as associative arrays.

For example, say the key is "year" and the value 2000, then a simple JSON object will look like this:

{"year": 2000}

Another example can be a key "name" and a value "Jessica":

{"name": "Jessica"}

If you have multiple key-value pairs, you separate each of them with a comma:

{
  "name1": "Nicole",
  "name2": "Pleuni",
  "name3": "Rori"
}

A more complex object might look like the following example. In this case we have JSON-object that contains three key-value pairs. Each of the keys is a "person" and the associated pair corresponds to an array which in turn contains a JSON-object with two key-value pairs: the first name, and the last name:

{
  "person1": [
    {
      "first": "Nicole",
      "last": "Adelstein"
    }
  ],
  "person2": [
    {
      "first": "Pleuni",
      "last": "Pennings"
    }
  ],
  "person3": [
    {
      "first": "Rori",
      "last": "Rohlfs"
    }
  ]
}

Because the data inside a JSON object is formed of key-value pairs, you could think of them as named arrays.

What do JSON-objects correspond to in R? Well, there’s not really a unique correspondence between a JSON-object and its equivalent in structure R. For instance, let’s bring back one of the JSON-objects previously discussed:

{
  "name1": "Nicole",
  "name2": "Pleuni",
  "name3": "Rori"
}

We could use a named R vector to store the same data:

# named vector in R
c("name1" = "Nicole", "name2" = "Pleuni", "name3" = "Rori")

But we could also use an R list:

# named list in R
list("name1" = "Nicole", "name2" = "Pleuni", "name3" = "Rori")

Keep in mind that JSON-objects can be more complex than this basic example. Because JSON objects can contain any other type of JSON data structure in them, the similar container in R to a JSON-object is a list.

8.3.4 Examples of JSON Data Containers

Here’s a series of examples involving combinations of JSON arrays and objects.

JSON containers can be nested. Here’s one example:

{
    "name": ["X", "Y", "Z"],
    "grams": [300, 200, 500], 
    "qty": [4, 5, null],
    "new": [true, false, true]
}

Here’s another example of nested containers:

[
    { "name": "X", 
      "grams": 300,
      "qty": 4,
      "new": true },
    { "name": "Y",
      "grams": 200,
      "qty": 5,
      "new": false },
    { "name": "Z",
      "grams": 500, 
      "qty": null,
      "new": true}
]

8.3.5 Data Table Toy Example

Let’s consider a less basic example with some tabular data set:

Name Gender Homeland Born Jedi
Anakin male Tatooine 41.9BBY yes
Amidala female Naboo 46BBY no
Luke male Tatooine 19BBY yes
Leia female Alderaan 19BBY no
Obi-Wan male Stewjon 57BBY yes
Han male Corellia 29BBY no
Palpatine male Naboo 82BBY no
R2-D2 unknown Naboo 33BBY no

How can we store this tabular data in JSON format? There are several ways to represent this data in JSON format. One option could be a JSON-array containing JSON-objects. Each JSON-object represents an individual:

    [
        {
         "Name": "Anakin",
         "Gender": "male", 
         "Homeworld": "Tatooine",
         "Born": "41.9BBY",
         "Jedi": "yes"
        },
        {
         "Name": "Amidala",
         "Gender": "female", 
         "Homeworld": "Naboo",
         "Born": 46BBY",
         "Jedi": "no"
        },
        ...
        {
         "Name": "R2-D2",
         "Gender": "unknown",
         "Homeworld": "Naboo",
         "Born": "33BBY",
         "Jedi": "no"
        }
    ]

Another way to represent the data in the table above is by using an object containing key-value pairs in which the keys are the names of the columns, and the pairs are arrays (the data values in each column).

{
  "Name": [ "Anakin", "Amidala", "Luke", ... , "R2-D2" ],
  "Gender": [ "male", "female", "male", ... , "unknown" ],
  "Homeworld": [ "Tatooine", "Naboo", "Tatooine", ... , "Naboo" ],
  "Born": [ "41.9BBY", "46BBY", "19BBY", ... , "33BBY" ],
  "Jedi": [ "yes", "no", "yes", ... , "no" ] 
}