3.1 Types of Variables

To illustrate some of the ideas presented in this chapter I’m going to use a toy example with data from the characters of the Star Wars universe. You can actually find the corresponding CSV file in the data/ folder of the book’s github repository.

               name gender height weight     species     jedi          weapon
1  Anakin Skywalker   male   1.88   84.0       human yes_jedi      lightsaber
2     Padme Amidala female   1.65   45.0       human  no_jedi         unarmed
3    Luke Skywalker   male   1.72   77.0       human yes_jedi      lightsaber
4       Leia Organa female   1.50   49.0       human  no_jedi         blaster
5      Qui-Gon Jinn   male   1.93   88.5       human yes_jedi      lightsaber
6    Obi-Wan Kenobi   male   1.82   77.0       human yes_jedi      lightsaber
7          Han Solo   male   1.80   80.0       human  no_jedi         blaster
8   Sheev Palpatine   male   1.73   75.0       human  no_jedi force-lightning
9             R2-D2   male   0.96   32.0       droid  no_jedi         unarmed
10            C-3PO   male   1.67   75.0       droid  no_jedi         unarmed
11             Yoda   male   0.66   17.0        yoda yes_jedi      lightsaber
12       Darth Maul   male   1.75   80.0 dathomirian  no_jedi      lightsaber
13            Dooku   male   1.93   86.0       human yes_jedi      lightsaber
14        Chewbacca   male   2.28  112.0     wookiee  no_jedi       bowcaster
15            Jabba   male   3.90     NA        hutt  no_jedi         unarmed
16 Lando Calrissian   male   1.78   79.0       human  no_jedi         blaster
17        Boba Fett   male   1.83   78.0       human  no_jedi         blaster
18       Jango Fett   male   1.83   79.0       human  no_jedi         blaster
19         Grievous   male   2.16  159.0     kaleesh  no_jedi     slugthrower
20     Chief Chirpa   male   1.00   50.0        ewok  no_jedi           spear

The table consists of 20 rows and 7 columns. The rows correspond to individuals and the columns correspond to variables. Although this data set is a toy example, it contains variables of different types commonly found in real data sets.

Interestingly, we can classify variables in a couple of different ways.

The most basic and usual way to classify variables is in two distinct types: quantitative variables and categorical (or qualitative) variables.

The variables height and weight are examples of quantitative variables because their values represent quantities. That is, they can be measured numerically on some sort of interval scale.

In turn, variables such as name, gender, species, jedi, and weapon are categorical or qualitative variables because their values represent categories (or qualities). More formally, they describe a quality of an individual, and allows you to place an individual into a category or group, such as male or female.

The division between categorical and quantitative variables is not the only one. Often, data scientists further classifiy categorical variables as nominal or ordinal. Likewise, quantitative variables can be classified as discrete or continuous. This next level of classification is chiefly based on the notion of scales of measurement of the variables.

Further classification of variables

Figure 3.2: Further classification of variables

3.1.1 Nominal Variable

A categorical variable is nominal when it results from naming or labeling values that don’t have a natural order. An example of a nominal variable is weapon which has the following values:

[1] "blaster"         "bowcaster"       "force-lightning" "lightsaber"     
[5] "slugthrower"     "spear"           "unarmed"        

Can you order the categories in a “natural” way? Not really. The term nominal according the dictionary means “existing in name only”. Thus, nominal values are just that: names. There is no reason why blaster is better or greater than lightsaber. You could say that you prefer a blaster over a lightsaber but that’s a different variable: personal preference.

Other typical examples of nominal variables are:

  • the sex of a newborn child: e.g. female or male

  • the ethnicity of an individual: e.g. Native-American, African-American, Asian, White

  • ice cream flavors: e.g. chocolate, vanilla, strawberry

  • the numbers on the players’ jerseys of a soccer team: numbers used as identifiers

3.1.2 Ordinal Variable

A categorical variable is ordinal when it results from ordering values into a series of categories when no appropriate numerical scale is available. For example, consider a variable “usage frequency” measured with values never, sometimes, and always. In this case we can order the categories from less usage to more usage, or viceversa.

Some examples of ordinal variables are:

  • size of clothes: extra-small, small, medium, large, extra-large

  • college year: freshman, sophomore, junior, senior

  • spiciness: none, mild, moderate, very

  • jedis ranks: youngling, padawan, knight, master, and grand master

3.1.3 Discrete Variable

A quantitative variable is discrete when it results from counting. To be more precise, a discrete variable takes on zero or a positive integer value. Some examples of discrete variables are:

  • the number of male ewooks in a family with four children (0, 1, 2, 3, or 4).

  • the number of robots per Imperial Star Destroyer

  • the number of moons orbiting around a planet

3.1.4 Continuous Variable

A quantitative variable is continuous when it results from measuring. More technically, a continuous variable theoretically takes on an infinite number of possible values, however, its reported values are subject to the precision or accuracy of the measurement device. Some examples of continuous variables are:

  • the height of an individual
  • the weight of a robot
  • the speed of a starship

3.1.5 Caveat

Keep in mind that not all variables fit neatly and unambiguously into one of the previous classes. For example, the age of an individual could be considered of a discrete variable when it gets reported in (whole) number of years. However, age could also be considered to be continuous when measured in a more granular scale: e.g. days, or hours, or seconds. Moreover, sometimes age is reported into ordered categories such as 0 to 5 years, 6 to 10, 11 to 15, and so on. These values would turn age into an ordinal variable.