Matrix and ordered matrix visualization with {gglot2}

Introduction

Visualization is important to see our data especially after ordering and/or clustering. Of course appropriate data size worth to see in a figure. In this post some visualization possiblity can be read without being exhaustive.

Firstly a matrix needed to see it graphically. Let’s create one. It has more structure than real cases to see easily the plots.

A <- matrix(c(2,5,2,1,0,0,0,0,1,0,0,0,0,1,3,5,6,0,0,1,0,0,0,2,0,0,1,2,7,2,4,6,2,5,1,0,0,1,0,0,0,1,0,0,3,5,4,0,0,1,0,0,1,0,0,2,0,3,5,7,3,1,4,0,1,0,0,0,0,2,0,0,0,1,3,4,6,0,0,1), byrow=T, nrow=8, ncol=10)
colnames(A) <- letters[1:10]
rownames(A) <- LETTERS[1:8]
print(A)
##   a b c d e f g h i j
## A 2 5 2 1 0 0 0 0 1 0
## B 0 0 0 1 3 5 6 0 0 1
## C 0 0 0 2 0 0 1 2 7 2
## D 4 6 2 5 1 0 0 1 0 0
## E 0 1 0 0 3 5 4 0 0 1
## F 0 0 1 0 0 2 0 3 5 7
## G 3 1 4 0 1 0 0 0 0 2
## H 0 0 0 1 3 4 6 0 0 1

Basic figures

This figure will show this matrix without any ordering or clustering. ggplot() needs long data that melt() {reshape} function can easily produce. It’s suggested to use fig.align=“center” chunk option in rmarkdown.

library(reshape2)
library(ggplot2)

longData<-melt(A)
longData<-longData[longData$value!=0,]

ggplot(longData, aes(x = Var2, y = Var1)) + 
  geom_raster(aes(fill=value)) + 
  scale_fill_gradient(low="grey90", high="red") +
  labs(x="letters", y="LETTERS", title="Matrix") +
  theme_bw() + theme(axis.text.x=element_text(size=9, angle=0, vjust=0.3),
                     axis.text.y=element_text(size=9),
                     plot.title=element_text(size=11))

Some orders make this clear. One of the best way to order matrix is seriate() function in {seriation} package. Let’s see how it works.

library(seriation)

set.seed(2)
o <- seriate(A, method="BEA_TSP")

#with the same longData then earlier
longData$Var1 <- factor(longData$Var1, levels=names(unlist(o[[1]][]))) 
longData$Var2 <- factor(longData$Var2, levels=names(unlist(o[[2]][])))
#levels must be names

ggplot(longData, aes(x = Var2, y = Var1)) + 
  geom_raster(aes(fill=value)) + 
  scale_fill_gradient(low="grey90", high="red") +
  labs(x="letters", y="LETTERS", title="Matrix") +
  theme_bw() + theme(axis.text.x=element_text(size=9, angle=0, vjust=0.3),
                     axis.text.y=element_text(size=9),
                     plot.title=element_text(size=11))

seriate() radomly choose the first step that’s why every code running resulted different plot. If you want to get the same plot set the seed.

Clustered matrix

In seriated matrix some structure can be seen. Let’s suppose that this matrix is a representation of a bipartite graph that nodes can be clustered for example with Louvain method implemented in {igraph} package. In this post graphical solutions are in focus that’s why graph theory things aren’t explained.

Our aim that we want to colour elements with same colours that are in same cluster.

First make clusters of matrix elements.

library(igraph)
#define a graph that represented as adjacency matrix with matrix A
g <- graph.incidence(A, weighted = TRUE)
#cluster wit Louvain algorithm
lou <- cluster_louvain(g)
df.lou <- data.frame(lou$names,lou$membership)

After that join cluster information to longData that we want to plot.

library(dplyr)

#the same longData than earlier
longData <- left_join(longData, df.lou, by=c("Var1"="lou.names"))
colnames(longData)[4] <- "Var1_clust"
longData$Var2 <- as.factor(longData$Var2)
longData <- left_join(longData, df.lou, by=c("Var2"="lou.names"))
colnames(longData)[5] <- "Var2_clust"
longData$colour <- ifelse(longData$Var1_clust==longData$Var2_clust, longData$Var1_clust, 0)

Fill colours by clusters

Lastly plot clustered matrix with all cluster information.

longData$Var1 <- factor(longData$Var1, levels=unique(arrange(longData, Var1_clust)[,1]))
longData$Var2 <- factor(longData$Var2, levels=unique(arrange(longData, Var2_clust)[,2]))
#levels must be names
longData$colour <- factor(longData$colour)
#for colours variabes must be factors (discrete scale) otherwise ggplot recognize it continous

ggplot(longData, aes(x = Var2, y = Var1, fill=colour)) + 
  geom_raster() + 
  scale_fill_manual(values=c("grey80", "#B40404", "#0B6121", "#FFBF00")) +
  labs(x="letters", y="LETTERS", title="Matrix") +
  theme_bw() + theme(axis.text.x=element_text(size=9, angle=0, vjust=0.3),
                     axis.text.y=element_text(size=9),
                     plot.title=element_text(size=11),
                     legend.text=element_text(size=7))

Colours of axis labels by clusters

Axes labels can be coloured as well but it needs some preparation. And values can be indicateted in a cells. But be carefull with too much information on one figure.

#use the same longData as earlier

#coloring axes labels
axis.y.colour <- (longData %>% select(Var1, Var1_clust) %>% unique %>% arrange(Var1_clust) %>% select(Var1_clust))[,1] %>% plyr::mapvalues(from=c(1:3), to=c("#B40404", "#0B6121", "#FFBF00"))
axis.x.colour <- (longData %>% select(Var2, Var2_clust) %>% unique %>% arrange(Var2_clust) %>% select(Var2_clust))[,1] %>% plyr::mapvalues(from=c(1:3), to=c("#B40404", "#0B6121", "#FFBF00"))

ggplot(longData, aes(x = Var2, y = Var1, fill=colour)) + 
  geom_raster() + 
  scale_fill_manual(values=c("grey80", "#B40404", "#0B6121", "#FFBF00")) +
  labs(x="letters", y="LETTERS", title="Matrix") +
  geom_point(aes(size=value)) +
  theme_bw() + theme(axis.text.x=element_text(size=9, angle=0, vjust=0.3),
                     axis.text.y=element_text(size=9),
                     plot.title=element_text(size=11),
                     legend.text=element_text(size=7)) +
theme(axis.text.y=element_text(colour=axis.y.colour),axis.text.x=element_text(colour=axis.x.colour))


Be happyR! :)

1 comment:

  1. Harrah's Casino New Orleans, LA - Mapyro
    Find 사천 출장샵 the best 김천 출장샵 Harrah's casino in New Orleans, LA. Mapyro is a real estate directory 아산 출장안마 that provides 하남 출장안마 a list of properties and 1 Borgata Way, 부천 출장안마 New Orleans, LA 70130, USA.

    ReplyDelete