20 Tables in Spreadsheets

What about storing data tables in spreadsheets?

A lot of people use spreadsheet software (e.g. MS Excel, Google Sheets) as their primary means for storing and manipulating data.

While they can be convenient, and they obviously deserve a place in your toolbox, we believe that using spreadsheets as the main way to store and manipulate data is far too limiting, and full of major disadvantages.

However, spreadhseets are so ubiquitous and omnipresent that we better teach important concepts and good practices in case you ever find yourself working them. Our recommendation is to minimize their use as storage and data wrangling option. But as a data analyst, you won’t have much control about how other users handle their data. Nevertheless, when a client, a colleague, or some other source share their data sets in spreadsheet formats, you can take care of some common issues, and make your life easier down the road.

Karl Broman has written extensively about this subject, and most of what we provide in this chapter follows the recommendations of his tutorial Organizing Data in Spreadsheets.

Dtat stored in spreadsheet

Figure 20.1: Dtat stored in spreadsheet

  • Many people enter and store their data in spreadsheets e.g. MS Excel, Google Sheets, Apple Numbers
  • Using spreadsheet provides a nice graphical display of a table’s content
  • Using spreadsheet software brings (a deceptive) comfort

Spreadsheets do have a role and a place in the toolkit of a data scientist. In fact, they could be used in any stage of the Data Analysis Cycle. But keep in mind that they enormously reduce reproducibility. And they should not be used as your default data-storage option.

  • Are so ubiquitous
  • Can be easy to work with
  • But can be a sloppy mess
  • Let’s discuss Karl Broman’s proposed recommendations when organizing data in spreadsheets.

20.0.1 Summary Slides