4 ways to read a file in R... by columns
Posted on June 23, 2012
Ever wonder how to read a file in R by columns? This question comes to mind when your analysis doesn’t require to import all the data in R, especially if the file is huge.
Sometimes you just want to read some columns, do some data manipulation, and plot some graphics. How can you do that in R? I’ll show you four different ways to do that without having to use a data base management system (DBMS) and SQL queries.
Toy example
For this post let’s consider a toy dataset of 12 rows and 7 columns in csv (comma-separated value) format. For instance, a dataset like the following one:
Option 1: cut and system
The first option consists of using a cut
command with the desired columns, and calling this command within the system()
function. The only “problem” is that the data will be stored in a vector. It is not
the best solution if what you want is a data frame, but it can do the trick if you
want to quickly inspect the columns.
Option 2: cut and pipe
The second option is similar to the first one. It consists of calling a cut
command but this time from the pipe()
function, which in turn is contained
inside a read.csv()
function.
Option 3: package colbycol
The third option consists of using the very handy function cbc.read.table()
that
comes with the package "colbycol"
(by Carlos Gil)
Option 4: package limma
The last option consists of using the function read.columns()
that comes with
the "limma"
package (by Gordon Smyth et al). Just a small detail: "limma"
is in Bioconductor,
not in CRAN. In this case, you need to specify the names of the columns to be read.
Happy data analysis!