Visually Enforced

a blog by Gaston Sanchez


Graphs from Dendrograms

Posted on June 29, 2014

R has various functions (and packages) for working with both hierarchical clustering dendrograms and graphs. The problem is that there’s almost no information on how convert a dendrogram into a graph.


Trees and Graphs

For one of my visualization projects I was faced with the following question: How to obtain a graph structure from a dendrogram structure? As simple as this question may seem, it took me a couple of days to find out a satisfactory answer.

Recall that a dendrogram is a just tree diagram used to display the arrangement of the clusters produced by hierarchical clustering. Interestingly, we can think of a dendrogram as a graph with a tree structure. The leafs of the dendrogram and the merged nodes can be seen as the graph nodes. In turn, the branches of the dendrogram can be regarded as the graph edges.


Hierarchical Clustering Dendrogram

Let’s start by generating a hierarchical clustering with hclust(). We’ll use the data USArrests for demo purposes:

# distance matrix 
dist_usarrests = dist(USArrests)

# hierarchical clustering analysis
clus_usarrests = hclust(dist_usarrests, method = "ward.D")

# plot dendrogram
plot(clus_usarrests, hang = -1)

Although "hclust" provides information about the merged nodes, it doesn’t provide all the necessary elements to build a graph. To do this, we need to use the function as.phylo() from the R package "ape". The good news about a "phylo" object is that it contains all the tree edges needed to build a graph.

# library ape
library(ape)

# convert 'hclust' to 'phylo' object
phylo_tree = as.phylo(clus_usarrests)

# get edges
graph_edges = phylo_tree$edge

Graph

Once we have the edges we can build a graph using the function graph.edgelist() from the package "igraph".

# library igraph
library(igraph)

# get graph from edge list
graph_net = graph.edgelist(graph_edges)

# plot graph
plot(graph_net)

The previous plot is kind of messy but the graph has exactly what we want: it has all the leafs, nodes and edges from the dendrogram.

Finally, we can play with the different graph layouts to get a set of x-y coordinates. For instance, we can use the layout.auto() function to get nodes nicely spread. Having the layout coordinates we can plot a graph with a tree structure using some R plotting tools:

# extract layout (x-y coords)
graph_layout = layout.auto(graph_net)

# number of observations
nobs = nrow(USArrests)

# start plot
plot(graph_layout[,1], graph_layout[,2], type = "n", axes = FALSE,
     xlab = "", ylab = "")
# draw tree branches
segments(
  x0 = graph_layout[graph_edges[,1],1], 
  y0 = graph_layout[graph_edges[,1],2],
  x1 = graph_layout[graph_edges[,2],1],
  y1 = graph_layout[graph_edges[,2],2],
  col = "#dcdcdc55", lwd = 3.5
)
# add labels
text(graph_layout[1:nobs,1], graph_layout[1:nobs,2],
     phylo_tree$tip.label, cex = 1, xpd = TRUE, font = 1)

If you’re interested in getting a pretty tree graph check the next post.


Published in categories how-to  Tagged with tree  graph  dendrogram