Pretty Tree Graph
Posted on July 05, 2014
In this post I’m sharing the code snippet in R I used to get a pretty graph to visualize dendrograms and clusters in an alternative way.

Recipe
The general recipe consists of the following steps:
- Obtain a distance matrix from your data set with dist()
- Perform a hierarchical clustering analysis with hclust()
- Examine the dendrogram to determine the number of clusters
- Cut the dendrogram to obtain clusters with cutree()
- Convert cluster structure into a "phylo"object withas.phylo()
- Use the tree nodes from the "phylo"object to obtain a graph withgraph.edgelist()
- Obtain a graph layout, in this case with layout.auto()
- Plot the data with the x-y coordinates from the graph layout!
Example with data “USArrests”
For this example I’m going to use the data set USArrests that comes with R. 
The idea is to get a dendrogram from a hierarchical clustering analysis. For 
illustration purposes I’m going to cut the dendrogram in 4 clusters.
# distance matrix 
dist_usarrests = dist(USArrests)
# hierarchical clustering analysis
clus_usarrests = hclust(dist_usarrests, method = "ward.D")
# plot dendrogram
plot(clus_usarrests, hang = -1)
Code in R: Pretty Tree Graph
Once we have the “not very outstanding” dendrogram, we can do some data wrangling in order to obtain a better layout to display the obtained clusters in a very appealing visual way. Here’s the code snippet in R (feel free to adapt it for your own visualizations).
pretty_tree <- function(dataset, num_clusters = 2, 
    dist_method = "euclidean", clus_method = "ward.D") 
{
  # required packages
  require(ape)     # for phylo trees
  require(igraph)  # for graphs
  
  # distance matrix
  dist_data = dist(dataset, method = dist_method)
  # hierarchical clustering
  hcluster = hclust(dist_data, method = clus_method)
  # cut dendrogram in given number of clusters
  clusters = cutree(tree = hcluster, k = num_clusters)
  
  # convert to phylo object
  phylo_tree = as.phylo(hcluster)
  # get edges
  graph_edges = phylo_tree$edge
  # convert to graph
  graph_net = graph.edgelist(graph_edges)
  # extract layout (x-y coords)
  graph_layout = layout.auto(graph_net)
  
  # colors like default ggplot2
  ggcolors <- function(n, alfa) {
    hues = seq(15, 375, length = n + 1)
    hcl(h = hues, l = 65, c = 100, alpha = alfa)[1:n]
  }
  
  # colors of labels and points
  txt_pal = ggcolors(num_clusters)
  pch_pal = paste(txt_pal, "55", sep='')
  txt_col = txt_pal[clusters]
  pch_col = pch_pal[clusters]
  # additional params
  nobs = length(clusters)
  nedges = nrow(graph_edges)
  
  # start plot
  plot(graph_layout[,1], graph_layout[,2], type = "n", axes = FALSE,
       xlab = "", ylab = "")
  # draw tree branches
  segments(
    x0 = graph_layout[graph_edges[,1],1], 
    y0 = graph_layout[graph_edges[,1],2],
    x1 = graph_layout[graph_edges[,2],1],
    y1 = graph_layout[graph_edges[,2],2],
    col = "#dcdcdc55", lwd = 3.5
  )
  # add tree leafs
  points(graph_layout[1:nobs,1], graph_layout[1:nobs,2], col = pch_col, 
         pch = 19, cex = 2)
  # add empty nodes
  points(graph_layout[(nobs+1):nedges,1], graph_layout[(nobs+1):nedges,2], 
         col = "gray90", pch = 19, cex = 0.5)
  # add node labels
  text(graph_layout[1:nobs,1], graph_layout[1:nobs,2], col = txt_col,
       phylo_tree$tip.label, cex = 1.5, xpd = TRUE, font = 1)
}
# plot
pretty_tree(USArrests, num_clusters = 4)