14 Counting More Systems
In the two previous chapters you learned how to count the number of tropical systems in each year, and how to graph a barchart of such counts.
In this chapter, we continue discussing how to obtain more counts. Specifically, we describe how to get the frequencies of say tropical depressions, or tropical storms, or hurricanes, or major hurricanes.
14.1 Counting Kinds of Storms
Recall that tropical systems are classified into different categories depending on their wind speed. This classification is based on the famous Saffir-Simpson wind scale, given in the following table.
Category | Scale | knots (kn) | mph |
---|---|---|---|
Tropical Depression | -1 | \(<=\) 33 | \(<=\) 38 |
Tropical Storm | 0 | 34 - 63 | 39 - 73 |
Hurricane category 1 | 1 | 64 - 82 | 74 - 95 |
Hurricane category 2 | 2 | 83 - 95 | 96 - 110 |
Hurricane category 3 | 3 | 96 - 112 | 111 - 129 |
Hurricane category 4 | 4 | 113 - 136 | 130 - 156 |
Hurricane category 5 | 5 | \(>=\) 137 | \(>=\) 157 |
Hurricanes of categories 3, 4, and 5 are considered to be major hurricanes.
14.1.1 Counting Tropical Depressions
We may start by getting the count of tropical depressions per year. Knowing
that tropical depressions have winds no greater than 33 knots, we can filter()
column wind
for this numeric value.
# trying to count tropical depressions per year
= storms %>%
depression_counts_per_year filter(wind <= 33) %>%
count(year, name) %>%
count(year)
# inspect a few rows
head(depression_counts_per_year)
## # A tibble: 6 × 2
## year n
## <dbl> <int>
## 1 1975 7
## 2 1976 7
## 3 1977 6
## 4 1978 10
## 5 1979 8
## 6 1980 10
According to this output, it seems that there were 2 tropical depressions in 1975, 2 in 1976, 3 in 1977, etc.
What about the number of tropical storms? Well, to obtain this count we should
filter wind
values between 34 and 63 knots, and then compute the count()
:
# trying to count tropical storms per year
= storms %>%
storm_counts_per_year filter(wind >= 34 & wind < 64) %>%
count(year, name) %>%
count(year)
# inspect a few rows
head(storm_counts_per_year)
## # A tibble: 6 × 2
## year n
## <dbl> <int>
## 1 1975 8
## 2 1976 7
## 3 1977 6
## 4 1978 11
## 5 1979 8
## 6 1980 11
Based on this output, it seems that there were 3 tropical storms in 1975, 2 in 1976, 3 in 1977, etc.
But wait … We know, from the preceding chapter, that the counts of systems per year are:
<- storms %>%
system_counts_per_year count(year, name) %>%
count(year)
head(system_counts_per_year)
## # A tibble: 6 × 2
## year n
## <dbl> <int>
## 1 1975 8
## 2 1976 7
## 3 1977 6
## 4 1978 11
## 5 1979 8
## 6 1980 11
How come is it that in 1975 there were two tropical depressions, and also two tropical storms, but a total of three systems?
Likewise, how is it that in 1976 there were also two tropical depressions, and two tropical storms, but a total of only two systems?
Okay, okay. You may know the answer to this apparent conundrum. The explanation for these seemingly contradictory results has to do with the life cycle of tropical systems. Technically speaking, they all start as baby storms or disturbances that can get bigger, eventually reaching tropical depression status, and under the right weather conditions, they can continue to grow reaching tropical storm category, or hurricane of category 1, or other bigger categories.
I know that the above outputs and the interpretations I’m providing for them may seem a bit silly to (some of) you. Yes, I’m doing it on purpose: to walk you through the type of exploration, sanity checks, and questions that you should be asking yourself along the way when analyzing data. This is especially vital when you are working with a data set from some topic or field in which you don’t have much experience with.
14.1.2 Counts Based on Maximum Wind Speed
A more adequate approach for identifying tropical depressions consists of first obtaining the maximum wind speed of each system, and then filtering for values of 33 knots or less.
# identifying tropical depressions
= storms %>%
depressions group_by(year, name) %>%
summarise(wind_max = max(wind), .groups = "drop") %>%
filter(wind_max <= 33)
slice_head(depressions, n = 10)
## # A tibble: 10 × 3
## year name wind_max
## <dbl> <chr> <int>
## 1 1991 AL041991 30
## 2 1991 AL101991 25
## 3 1992 AL021992 30
## 4 1992 AL031992 30
## 5 1992 AL081992 30
## 6 1993 AL101993 30
## 7 1994 AL021994 30
## 8 1994 AL051994 30
## 9 1994 AL081994 30
## 10 1994 AL091994 30
Having obtained the data that exclusively contains tropical depressions, we can then count their number in each year:
# counting tropical depressions per year
= depressions %>%
depression_counts_per_year count(year)
slice_head(depression_counts_per_year, n = 10)
## # A tibble: 10 × 2
## year n
## <dbl> <int>
## 1 1991 2
## 2 1992 3
## 3 1993 1
## 4 1994 5
## 5 1995 2
## 6 1997 1
## 7 1999 4
## 8 2000 4
## 9 2001 2
## 10 2002 2
14.1.3 Counting Tropical Storms
Tropical storms are tropical systems with wind speeds between 39 mph and 73 mph or equivalently, between 34 and 63 knots. With this information, we proceed in the same way as in the above subsection: first we identify the set that exclusively contains tropical storms, and then we count them by year.
# identifying tropical storms
= storms %>%
trop_storms group_by(year, name) %>%
summarise(wind_max = max(wind), .groups = "drop") %>%
filter(wind_max >= 34 & wind_max < 64)
slice_head(trop_storms, n = 10)
## # A tibble: 10 × 3
## year name wind_max
## <dbl> <chr> <int>
## 1 1975 Amy 60
## 2 1975 Hallie 45
## 3 1976 Dottie 45
## 4 1977 Frieda 50
## 5 1978 Amelia 45
## 6 1978 Bess 45
## 7 1978 Debra 50
## 8 1978 Hope 55
## 9 1978 Irma 45
## 10 1978 Juliet 45
# counting tropical storms per year
= trop_storms %>%
storm_counts_per_year count(year)
slice_head(storm_counts_per_year, n = 10)
## # A tibble: 10 × 2
## year n
## <dbl> <int>
## 1 1975 2
## 2 1976 1
## 3 1977 1
## 4 1978 6
## 5 1979 3
## 6 1980 2
## 7 1981 4
## 8 1982 3
## 9 1983 1
## 10 1984 7
So far, so good. Things seem to make sense, and we are making nice progress in our data exploration journey.
But before moving on with other counts, let’s do a quick sanity check (just in
case). For example, we could take into consideration the column category
.
This column contains numeric codes for the different types of categories based
on the Saffir-Simpson wind scale.
To be more precise: in addition to wind_max
, let’s also include the maximum
category
value when identifying the set of tropical storms. If everything
is correct, all the entries should have category value of 0 which is the
numeric code for tropical storms.
# identifying tropical storms
= storms |>
trop_storms filter(wind >= 34) |>
group_by(year, name) |>
summarise(wind_max = max(wind)) |>
filter(wind_max < 64)
## `summarise()` has grouped output by 'year'. You can override using
## the `.groups` argument.
slice_head(trop_storms, n = 10)
## # A tibble: 264 × 3
## # Groups: year [47]
## year name wind_max
## <dbl> <chr> <int>
## 1 1975 Amy 60
## 2 1975 Hallie 45
## 3 1976 Dottie 45
## 4 1977 Frieda 50
## 5 1978 Amelia 45
## 6 1978 Bess 45
## 7 1978 Debra 50
## 8 1978 Hope 55
## 9 1978 Irma 45
## 10 1978 Juliet 45
## # ℹ 254 more rows
This output only displays the first ten rows, but you are welcome to inspect
the values in column name
.
Did you observe anything special? Was there anything that caught your attention?
If your answer is “No”, then go back to carefully inspect the content of
trop_storms
.
If your answer is “Yes”, because you noticed some weird storm names such as
AL031987
or AL061988
, then let me say to you: “good job!”
Indeed, there seem to be a few tropical storms that don’t have a proper name.
In theory, once a tropical system reaches tropical storm status, it receives
a non-alphanumeric name such as Amy
or Amelia
or Bess
.
To detect all the entries in trop_storms
that don’t have a proper name you
can use the following command:
# unnamed storms
%>%
trop_storms filter(str_starts(name, pattern = "A(l|L)\\d+"))
## # A tibble: 5 × 3
## # Groups: year [5]
## year name wind_max
## <dbl> <chr> <int>
## 1 1987 AL031987 40
## 2 1988 AL061988 50
## 3 1993 AL011993 35
## 4 2006 AL022006 45
## 5 2011 Al202011 40
As you can tell, this command is a filtering operation. However, the code
inside filter()
is somewhat advanced. It uses the str_starts()
function to
match those names that start with either "AL"
or "Al"
and are followed
by several digits.
To be honest, I’m not sure why we have these mismatched values. It could be
that the category
values for these systems are incorrect. Or it could be the
opposite: the category
is okay, but the problem is with the name
values.