14 Counting More Systems

In the two previous chapters you learned how to count the number of tropical systems in each year, and how to graph a barchart of such counts.

In this chapter, we continue discussing how to obtain more counts. Specifically, we describe how to get the frequencies of say tropical depressions, or tropical storms, or hurricanes, or major hurricanes.

14.1 Counting Kinds of Storms

Recall that tropical systems are classified into different categories depending on their wind speed. This classification is based on the famous Saffir-Simpson wind scale, given in the following table.

Table: Saffir-Simpson scale
Category Scale knots (kn) mph
Tropical Depression -1 \(<=\) 33 \(<=\) 38
Tropical Storm 0 34 - 63 39 - 73
Hurricane category 1 1 64 - 82 74 - 95
Hurricane category 2 2 83 - 95 96 - 110
Hurricane category 3 3 96 - 112 111 - 129
Hurricane category 4 4 113 - 136 130 - 156
Hurricane category 5 5 \(>=\) 137 \(>=\) 157

Hurricanes of categories 3, 4, and 5 are considered to be major hurricanes.

14.1.1 Counting Tropical Depressions

We may start by getting the count of tropical depressions per year. Knowing that tropical depressions have winds no greater than 33 knots, we can filter() column wind for this numeric value.

# trying to count tropical depressions per year
depression_counts_per_year = storms %>% 
  filter(wind <= 33) %>%
  count(year, name) %>%
  count(year)

# inspect a few rows
head(depression_counts_per_year)
## # A tibble: 6 × 2
##    year     n
##   <dbl> <int>
## 1  1975     7
## 2  1976     7
## 3  1977     6
## 4  1978    10
## 5  1979     8
## 6  1980    10

According to this output, it seems that there were 2 tropical depressions in 1975, 2 in 1976, 3 in 1977, etc.

What about the number of tropical storms? Well, to obtain this count we should filter wind values between 34 and 63 knots, and then compute the count():

# trying to count tropical storms per year
storm_counts_per_year = storms %>% 
  filter(wind >= 34 & wind < 64) %>%
  count(year, name) %>%
  count(year)

# inspect a few rows
head(storm_counts_per_year)
## # A tibble: 6 × 2
##    year     n
##   <dbl> <int>
## 1  1975     8
## 2  1976     7
## 3  1977     6
## 4  1978    11
## 5  1979     8
## 6  1980    11

Based on this output, it seems that there were 3 tropical storms in 1975, 2 in 1976, 3 in 1977, etc.

But wait … We know, from the preceding chapter, that the counts of systems per year are:

system_counts_per_year <- storms %>% 
  count(year, name) %>% 
  count(year)

head(system_counts_per_year)
## # A tibble: 6 × 2
##    year     n
##   <dbl> <int>
## 1  1975     8
## 2  1976     7
## 3  1977     6
## 4  1978    11
## 5  1979     8
## 6  1980    11

How come is it that in 1975 there were two tropical depressions, and also two tropical storms, but a total of three systems?

Likewise, how is it that in 1976 there were also two tropical depressions, and two tropical storms, but a total of only two systems?

Okay, okay. You may know the answer to this apparent conundrum. The explanation for these seemingly contradictory results has to do with the life cycle of tropical systems. Technically speaking, they all start as baby storms or disturbances that can get bigger, eventually reaching tropical depression status, and under the right weather conditions, they can continue to grow reaching tropical storm category, or hurricane of category 1, or other bigger categories.

I know that the above outputs and the interpretations I’m providing for them may seem a bit silly to (some of) you. Yes, I’m doing it on purpose: to walk you through the type of exploration, sanity checks, and questions that you should be asking yourself along the way when analyzing data. This is especially vital when you are working with a data set from some topic or field in which you don’t have much experience with.

14.1.2 Counts Based on Maximum Wind Speed

A more adequate approach for identifying tropical depressions consists of first obtaining the maximum wind speed of each system, and then filtering for values of 33 knots or less.

# identifying tropical depressions
depressions = storms %>%
  group_by(year, name) %>%
  summarise(wind_max = max(wind), .groups = "drop") %>%
  filter(wind_max <= 33)

slice_head(depressions, n = 10)
## # A tibble: 10 × 3
##     year name     wind_max
##    <dbl> <chr>       <int>
##  1  1991 AL041991       30
##  2  1991 AL101991       25
##  3  1992 AL021992       30
##  4  1992 AL031992       30
##  5  1992 AL081992       30
##  6  1993 AL101993       30
##  7  1994 AL021994       30
##  8  1994 AL051994       30
##  9  1994 AL081994       30
## 10  1994 AL091994       30

Having obtained the data that exclusively contains tropical depressions, we can then count their number in each year:

# counting tropical depressions per year
depression_counts_per_year = depressions %>%
  count(year)

slice_head(depression_counts_per_year, n = 10)
## # A tibble: 10 × 2
##     year     n
##    <dbl> <int>
##  1  1991     2
##  2  1992     3
##  3  1993     1
##  4  1994     5
##  5  1995     2
##  6  1997     1
##  7  1999     4
##  8  2000     4
##  9  2001     2
## 10  2002     2

14.1.3 Counting Tropical Storms

Tropical storms are tropical systems with wind speeds between 39 mph and 73 mph or equivalently, between 34 and 63 knots. With this information, we proceed in the same way as in the above subsection: first we identify the set that exclusively contains tropical storms, and then we count them by year.

# identifying tropical storms
trop_storms = storms %>%
  group_by(year, name) %>%
  summarise(wind_max = max(wind), .groups = "drop") %>%
  filter(wind_max >= 34 & wind_max < 64)

slice_head(trop_storms, n = 10)
## # A tibble: 10 × 3
##     year name   wind_max
##    <dbl> <chr>     <int>
##  1  1975 Amy          60
##  2  1975 Hallie       45
##  3  1976 Dottie       45
##  4  1977 Frieda       50
##  5  1978 Amelia       45
##  6  1978 Bess         45
##  7  1978 Debra        50
##  8  1978 Hope         55
##  9  1978 Irma         45
## 10  1978 Juliet       45
# counting tropical storms per year
storm_counts_per_year = trop_storms %>%
  count(year)

slice_head(storm_counts_per_year, n = 10)
## # A tibble: 10 × 2
##     year     n
##    <dbl> <int>
##  1  1975     2
##  2  1976     1
##  3  1977     1
##  4  1978     6
##  5  1979     3
##  6  1980     2
##  7  1981     4
##  8  1982     3
##  9  1983     1
## 10  1984     7

So far, so good. Things seem to make sense, and we are making nice progress in our data exploration journey.

But before moving on with other counts, let’s do a quick sanity check (just in case). For example, we could take into consideration the column category. This column contains numeric codes for the different types of categories based on the Saffir-Simpson wind scale.

To be more precise: in addition to wind_max, let’s also include the maximum category value when identifying the set of tropical storms. If everything is correct, all the entries should have category value of 0 which is the numeric code for tropical storms.

# identifying tropical storms
trop_storms = storms |>
  filter(wind >= 34) |>
  group_by(year, name) |>
  summarise(wind_max = max(wind)) |>
  filter(wind_max < 64)
## `summarise()` has grouped output by 'year'. You can override using
## the `.groups` argument.
slice_head(trop_storms, n = 10)
## # A tibble: 264 × 3
## # Groups:   year [47]
##     year name   wind_max
##    <dbl> <chr>     <int>
##  1  1975 Amy          60
##  2  1975 Hallie       45
##  3  1976 Dottie       45
##  4  1977 Frieda       50
##  5  1978 Amelia       45
##  6  1978 Bess         45
##  7  1978 Debra        50
##  8  1978 Hope         55
##  9  1978 Irma         45
## 10  1978 Juliet       45
## # ℹ 254 more rows

This output only displays the first ten rows, but you are welcome to inspect the values in column name.

Did you observe anything special? Was there anything that caught your attention?

If your answer is “No”, then go back to carefully inspect the content of trop_storms.

If your answer is “Yes”, because you noticed some weird storm names such as AL031987 or AL061988, then let me say to you: “good job!”

Indeed, there seem to be a few tropical storms that don’t have a proper name. In theory, once a tropical system reaches tropical storm status, it receives a non-alphanumeric name such as Amy or Amelia or Bess. To detect all the entries in trop_storms that don’t have a proper name you can use the following command:

# unnamed storms
trop_storms %>%
  filter(str_starts(name, pattern = "A(l|L)\\d+"))
## # A tibble: 5 × 3
## # Groups:   year [5]
##    year name     wind_max
##   <dbl> <chr>       <int>
## 1  1987 AL031987       40
## 2  1988 AL061988       50
## 3  1993 AL011993       35
## 4  2006 AL022006       45
## 5  2011 Al202011       40

As you can tell, this command is a filtering operation. However, the code inside filter() is somewhat advanced. It uses the str_starts() function to match those names that start with either "AL" or "Al" and are followed by several digits.

To be honest, I’m not sure why we have these mismatched values. It could be that the category values for these systems are incorrect. Or it could be the opposite: the category is okay, but the problem is with the name values.