# 5 functions to do Correspondence Analysis in R

##### Posted on July 19, 2012

In a previous post, I talked about five different ways to do Principal Components Analysis in R

PCA is very useful and is one of the most applied multivariate techniques. However, PCA is limited to quantitative information. But what if our data comes in the form of qualitative information such as categorical data? The solution: Correspondence Analysis.

Correspondence Analysis, briefly CA, is one of the cousins of Principal Component Analysis. Both CA and PCA are multivariate techniques that help us to summarize the systematic patterns of variations in the data. The difference between CA and PCA is that CA applies to categorical (i.e. qualitative) data instead of continuous (i.e. quantitative) data. More specifically, CA applies to categorical data in the form of contingency tables (aka cross-tabulation). Since CA is conceptually similar to PCA, we can use it, among other things, for visualizing multidimensional data into a lower dimensional space.

### CA in R

In R, there are several functions from different packages that allow us to apply Correspondence Analysis. In this post I’ll show you 5 different ways to perform CA using the following functions (with their corresponding packages in parentheses):

• ca() (ca)
• CA() (FactoMineR)
• dudi.coa() (ade4)
• afc() (amap)
• corresp() (MASS)

As in PCA, no matter what function you decide to use for CA, the typical results should consist of a set of eigenvalues, a table with the row coordinates, and a table with the column coordinates. The eigenvalues provide information of the variability in the data. The row coordinates provide information about the structure of the rows in the analyzed table. The column coordinates provide information about the structure of the columns in the analyzed table.

### The Data

We’ll use the dataset author that already comes with the R package "ca". It’s a data matrix containing the counts of the 26 letters of the alphabet (columns of matrix) for 12 different novels (rows of matrix). Each row contains letter counts in a sample of text from each work, excluding proper nouns.

### Option 1: using ca

The function ca() comes in the package of the same name ca by Michael Greenacre and Oleg Nenadic. I personally like this package because of Greenacre’s work and books about CA. In addition, it has a very nice function to plot results in 3D (plot3d.ca())

### Option 2: using CA

One of my favorite options is the CA() function from the packageFactoMineR. What I like is that this function provides many more detailed results and assessing tools. It also comes with a number of parameters that allow you to tweak the analysis in a very nice way.

### Option 3: using dudi.coa

Another option to perform CA is by using the function dudi.coa()> that comes with the package ade4 (remember to install the package first).

### Option 4: using afc

Another option is to use the afc() function from the package amap (remember to install it first).

### Option 5: using corresp

A fifth possibility is the corresp() function from the package MASS.

### CA plot

The typical graphic in a CA analysis is to visualize the data in a two dimensional space using the first two extracted coordinates from both rows and columns. Although we could visualize the rows and the columns separately, the usual approach is to plot both in a single graphic to get an idea of the association between them. As you can tell from the displayed code chunks, most of the CA functions have their own plot command. However, we can also use the nice tools of "ggplot2". In the following example we will also use the package "stringr"

Published in categories how-to  Tagged with correspondence analysis  ca  multivariate  plot  R