8 Introduction: Hurricanes Data

One of the main dishes in this book has to do with working with tabular data (data arranged in rows and columns). The main reason why we like to start with tables is because tabular data is the most ubiquitous format in which data is handled for most types of analysis. And even if your raw data is not in tabular format, sooner or later, you’ll be handling data in tabular format in most data analysis projects. Obviously there are limitations. Not everything that is done in computing with data can be done with tables. But we are postponing the discussion of other data objects and programming concepts for later chapters in the book.

We’ll begin our discussion about data tables from the “high level” (scientist’s point of view), especially how to get to know your data (e.g. univariate, bivariate, multivariate analysis).

8.1 Installing some packages

We want you to get your hands dirty in R as quick as possible, and perhaps the best way to do this is by working on a case study. To keep things moderately simple, we use one of the data sets that comes in "dplyr", one the most popular R packages for manipulating tables. The main reason to start in this mode, is to avoid having to worry about data importing issues, which we cover later in chapter Importing Tables. The other reason is to have data that is already clean and ready to be analyzed. You will also have time to learn tools and skills for cleaning data sets in subsequent chapters.

We are assuming that you already installed the packages "dplyr" and "ggplot2". If that’s not the case then run on the console the command below (do NOT include this command in any Rmd file):

# don't include this command in any Rmd file
# don't worry too much if you get a warning message
install.packages(c("dplyr", "ggplot2"))

Remember that you only need to install a package once! After a package has been installed in your machine, there is no need to call install.packages() again on the same package. What you should always invoke, in order to use the functions in a package, is the library() function:

# (you should include this command in your Rmd file)

About loading packages: Another rule to keep in mind is to always load any required packages at the very top of your script files (.R or .Rmd or .Rnw files). Avoid calling the library() function in the middle of a script. Instead, load all the packages before anything else.

The package "dplyr" contains a dataset called storms which is a subset of the NOAA Atlantic hurricane database best track data.


This database is one of several data sets available in the National Hurricane Center (NHC) Data Archive, which is part of the National Oceanic and Atmospheric Administration (NOAA). Before doing any analysis on the storms dataset, we need to learn some basic notions about hurricanes.

8.2 Hurricanes Data

NASA satellite image of hurricane Sandy, 2012 (source: wikimedia commons)

Figure 8.1: NASA satellite image of hurricane Sandy, 2012 (source: wikimedia commons)

"Hurricane Sandy (unofficially referred to as Superstorm Sandy) was the deadliest and most destructive, as well as the strongest, hurricane of the 2012 Atlantic hurricane season. Inflicting nearly $70 billion (2012 USD) in damage, it was the second-costliest hurricane on record in the United States until surpassed by Hurricanes Harvey and Maria in 2017.


8.2.1 Hurricanes and Climate Change

  • Hurricanes are a natural part of our climate system.

  • Recent research suggests an increase in intense hurricane activity in the North Atlantic since the 1970s.

  • In the future, there may not necessarily be more hurricanes, but there will likely be more intense hurricanes (higher wind speeds and more precipitation).

  • The impacts of this trend are likely to be exacerbated by sea level rise and a growing population along coastlines.

“Hurricanes and Climate Change” article published here:


New research estimates that as the Earth has warmed, the probability of a storm with precipitation levels like Hurricane Harvey was higher in Texas in 2017 than it was at the end of the twentieth century. Because of climate change, such a storm evolved from a once in every 100 years event to a once in every 16 years event over this time period.

8.3 A little bit about Hurricanes

  • Hurricanes are the most violent storms on Earth.

  • People call these storms by other names, such as typhoons or cyclones, depending on where they occur.

  • The scientific term for all these storms is tropical cyclone.

  • Only tropical cyclones that form over the Atlantic Ocean or eastern Pacific Ocean are called “hurricanes.”

  • A tropical cyclone is a rotating low-pressure weather system that has organized thunderstorms but no fronts.

  • Hurricanes are tropical cyclones whose sustained winds have reached 74 mph.

  • At this point the hurricane reaches category 1 on the Saffir-Simpson Hurricane Wind Scale,

  • Saffir-Simpson Hurricane Wind Scale is a 1 to 5 rating based on a hurricane’s sustained wind speed:

    • category 1: 74-95 mph; 64-82 kt; 119-153 km/h
    • category 2: 96-110 mph; 83-95 kt; 154-177 km/h
    • category 3: 111-129 mph; 96-112 kt; 178-208 km/h
    • category 4: 130-156 mph; 113-136 kt; 209-251 km/h
    • category 5: 157 mph or higher; 137 kt or higher; 252 km/h or higher

  • Major hurricanes are defined as Category 3, 4, and 5 storms.

  • The official Atlantic hurricane season runs from June through November, but occasionally storms form outside those months.

  • September is the most common month for hurricanes making landfall in the U.S., followed by August and October (based on 1851 to 2015 data)

  • A typical year has 12 named storms, six hurricanes, and three major hurricanes.

  • No hurricanes made U.S. landfall before June and after November during the period studied (1851 to 2015 data).

8.4 Hurricane Tracks Data

The data set that we are going to analyze comes from Hurricane Databases (HURDAT), managed by the National Hurricane Center (NHC).

  • HURDAT involves two databases: one for storms occurring in the Atlantic Ocean, and another one for storsm occurring in the Eastern Pacific Ocean.

  • HURDAT contains records from year 1851 till present.

  • Keep in mind that in the past (before 1970s?), tropical depressions, that did not develop into tropical storms or hurricanes were not included within the database.

From Wikipedia: around 1963, NASA’s Apollo space programme requested data, on the climatological impacts of tropical cyclones on launches of space vehicles at the Kennedy Space Center. The basic data was taken from the National Weather Records North Atlantic Tropical to include data from 1886–1968. As a result of this work, a requirement for a computerized tropical cyclone database at the National Hurricane Centre (NHC) was realised