36 Redirecting Inputs and Outputs

So far you have been working with the command line interface using basic commands to move around your file system (e.g. cd, ls, pwd), to inspect contents of files (e.g. wc, head, tail, cat, less, file), and to perform typical file operations:

  • create a directory, e.g. mkdir lab05
  • create an empty file, e.g. touch README.md
  • rename a file, e.g. mv report.Rmd document.Rmd
  • move a file to another directory, e.g. mv myscript.R lab05/
  • copy a file, e.g. cp data1.csv data2.csv
  • remove a file, e.g. rm datafile.csv

Likewise, all the commands you’ve learned so far have required you to enter information at the command line, and all have produced output on the screen. The next step involves learning how to combine existing commands in new ways. To have a working example, we’ll start with some toy directory toydir containing a handful of files:

mkdir toydir
cd toydir
touch file1.txt
touch README.md
curl -O http://web.pdx.edu/~gerbing/data/cars.csv
curl -O http://web.pdx.edu/~gerbing/data/employee.csv
curl -O http://web.pdx.edu/~gerbing/data/ppseps.csv

You should have a filestructure like this:

toydir/
   README.md
   file1.txt
   cars.csv
   employee.csv
   ppseps.csv

As you can tell, we have three CSV files. Say you want to find out which CSV file is the shortest? We can run the command wc *.csv to answer this question:

wc *.csv
     393    1027   20964 cars.csv
      37      75    1832 employee.csv
      11      28     252 ppseps.csv
     441    1130   23048 total

In this example we only have three CSV files, but what if there were 1000? Our first step toward a solution is to run the command wc to get the number of lines in each CSV file:

wc -l *.csv
     393 cars.csv
      37 employee.csv
      11 ppseps.csv
     441 total

What if you want to display the output above in increasing order? The answer is given with the following command:

wc -l *.csv | sort

and the displayed output should be:

      11 ppseps.csv
      37 employee.csv
     393 cars.csv
     441 total

To better understand the previous command, we first need to talk about some technical aspects about the terminal, the shell, files, and unix things.

36.1 Some Technical Background

When you use the terminal, you are interacting with a program called the shell. There are different shell flavors, but the most common one is the bash shell. What does the shell (e.g. bash) do? Basically, the shell interprets the commands that you type in and either executes them directly or passes them on to other programs.

For example, consider the command cat ppseps.csv which displays the contents of ppseps.csv on the screen:

cat ppseps.csv
Company,EPS,PPS
Imo Indust Inc,-3.26,6.500
Toro Co ,-1.98,13.000
Calmat Co,-0.45,22.500
Tultex Corp,0.56,8.625
Fam Dol St,1.00,17.250
Phil Sub Corp,1.23,16.000
Rtz Plc,1.50,41.375
Tandy Corp,2.24,24.500
Ok Gas & Elct,2.42,34.125
Nicor Inc ,3.83,49.750

It is the shell that finds the file ppseps.csv, and calls the cat command to ask it to print the file’s contents. In turn, the cat command calls the kernel to find ppseps.csv on the disk and print its contents as a stream of characters on the terminal (i.e. monitor).

Some commands that you type are built into the shell. For example, the cd command is built-in. That is, the shell interprets that command and changes your current directory. If you want to know where this command is stored in your computer, use which like this:

which cd

The ls command, on the other hand, is an external program typically stored in the file /bin/ls.

which ls

When you type the name of a command, the shell first checks to see if it is a built-in command and, if so, executes it.

36.2 Standard Input and Output

Most unix commands take input from your terminal and send the resulting output back to your terminal. A command normally reads its input from a place called standard input, which happens to be your keyboard by default. Similarly, a command normally writes its output to standard output, which is also your terminal by default.

If a command is executed without a filename argument, the command takes its input from standard input. One example of this type of command is the cat command. If you don’t provide the name of a file to cat, then it expects to take input from your keyboard. Here’s a toy example, type in cat and then press the Enter key, then type three sentences, and finally press the keys Ctrl+d to stop the execution of cat:

  • hi there!
  • never mind
  • see you later!

You should be able to see some lines of text like the following ones:

cat
hi there!
hi there!
never mind
never mind
see you later!
see you later!

The command sort is another example that can take input from the keyboard. In the terminal, type in four words, and the press Ctrl+d, for instance:

sort
Voldemort
Dumbledore
Potter
Granger

You should see the following sorted output in your monitor:

Dumbledore
Granger
Potter
Voldemort

Because no filename was specified to the sort command, the input was taken from standard input, i.e. the keyboard. After the fourth name was typed in, the Ctrl and d keys were pressed to signal the end of the data stream. At that point, the sort command sorted the four names and displayed the results on the standard output, i.e. your monitor.

36.3 Using Standard Input and Output

If a program’s input consists entirely of alphanumeric and punctuation characters, there is no difference between reading data from a file and reading data from a terminal. Likewise, if a program’s output consists entirely of alphanumeric characters and punctuation, there is no difference between writing to a file, writing to a terminal, and writing to the input of another program.

The standard Input/Output facility, typically referred to as I/O, provides some simple defaults for managing input/output. There are three default I/O streams: 1) standard input, 2) standard output, and 3) standard error. By convention, standard output aka stdout consists of all normal output from a command, while standard error, abbreviated stderr, consists of error messages.

Standard input (stdin) normally comes from your keyboard. Many programs ignore stdin; you name files directly on the command line. For instance, the command cat file1.csv file2.csv never reads its standard input; it reads the files directly . But without filenames on the command line, commands that need input will usually read stdin. Standard input usually comes from your keyboard, but the shell can redirect stdin from a file.

The real advantage of standard I/O is that it allows you to redirect input or output away from your terminal to a file. For example, if you want to run the command cat file1.csv file2.csv, but you want to place the output in file3.csv rather than sending it to your terminal, you have to use the following command:

# redirecting output to file3.csv
cat file1.csv file2.csv > file3.csv

This is called redirecting standard output to file3.csv. If you execute this command and look at the contents of file3.csv, you will find the contents of file1.csv, followed by the contents of file2.csv.

One of the best-known forms of redirection in unix is the pipe. The shell’s vertical bar | operator makes a pipe.

Description bash example
Send stdout to file cmd > file
Send stderr to file cmd 2> file
Take stdin from file cmd < file
Send stdout to end of file cmd >> file

Note: Keep in mind that the syntax used to redirect standard I/O depends on the shell you are using.

In the next section, you’ll learn how to use I/O redirection operators, and apply them to do basic wrangling operations on data tables.