Introduction to R notebook

Actually it is an Rmarkdown to compose a work flow like document. Easily changable and transparent tool to edit and run codes in chunks and edit text between them.

It supports
  • different languages: R, python, C++ (Rccp), SQL (later Perl)
  • line by line code running
  • whole chunk code running
  • inteactive graphs
Benefits
  • if a (html) document was already make it is not neccesary to rerun all document all the code chunks if some code changed with preview -->easy to iterate codes and text with generated document with preview
  • a rendered html document can viewed in editor like original Rmd file
  • there are less tradeoffs than with Jupyter notebook that support R as well
  • usefull to see the result of a chunk
Disadvantages
  • cannot enter tex like equations (yet)
  • the rendered document is a static html so uploaded data file cannot be used (yet)

And how to start?

Just use this head below as Rmd file head. This code means a notebook format to RStudio and if you want to make an html document the "html_document" section is used.

---
title: "xy"
output:
  html_notebook: default
  html_document:
    highlight: textmate
    theme: spacelab
    toc: yes
---

Or just use R notebook button in RStudio when start a document (Version 0.99.1289, RStudio Preview)

Be sure you have R version 3.3 (or higher) for proper work. And need some packages to update.

Be happyR!

See also:
https://www.rstudio.com/resources/webinars/introducing-notebooks-with-r-markdown/
http://rmarkdown.rstudio.com/r_notebooks.html
https://www.rstudio.com/products/rstudio/download/preview/

Create bipartite graph with {igraph}

Introduction

Bipartite graph is a graph whose nodes can be devided into two disjoint sets and edges connects these two (U,V) independent sets. There are weighted and unweighted graph depending on weight of edges.

If one wants to create bipartite graph the first question is what kind of dataset available.

  • a matrix that represents a bipartite graph: rows are first or primary vertex set, columns are second or secondary vertex set, cells are weights or 0/1, that is adjacency matrix
  • two column edgelist: first vertex set in first column second vertex set in second column and rows represent connections; in weighted case there are recurring pairs of nodes
  • three column edgelist (weighted graphs): first vertex set in first column second vertex set in second column and connection weights in third column

The second question is weigted or unweighted graph one need. There is many possibilities to wrangling data.

Direction parameter can be set with {igraph} if it is needed.

For use the exaples above you need to load {igraph} package.

library(igraph)
library(reshape2)

Graph from adjacency matrix

Suppose one have this matrix that represents a weighted matrix.

A <- matrix(c(2,4,0,0,1,2,1,0,0,0,4,5,0,0,1,2,0,0,6,2), byrow = T, nrow = 5, ncol = 4)
rownames(A) <- letters[c(1:nrow(A))]
colnames(A) <- LETTERS[c(1:ncol(A))]
print(A)
##   A B C D
## a 2 4 0 0
## b 1 2 1 0
## c 0 0 4 5
## d 0 0 1 2
## e 0 0 6 2

A bipratite graph can be create with {igraph} from A adjacency matrix.

g <- graph.incidence(A, weighted = T)
g
## IGRAPH UNWB 9 11 -- 
## + attr: type (v/l), name (v/c), weight (e/n)
## + edges (vertex names):
##  [1] a--A a--B b--A b--B b--C c--C c--D d--C d--D e--C e--D

Parameters of created graph can be seen. This graph is UN-B which mean that this graph is U = undirected, N = named, W = weighted, B = bipartite, 9 = # of vertex, 11 = # of edges, and there are connections.

This function automatically set names and types of nodes. You can check them.

V(g)$type
## [1] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE
V(g)$name
## [1] "a" "b" "c" "d" "e" "A" "B" "C" "D"

FALSE type means primary and TRUE means secondary vertex set.

Graph from two column edge list

The egdelist of graph above defined is the follow. Rows represent connections thats why some row repeated according to weights.

##   from to
## 1    a  A
## 2    a  A
## 3    a  B
## 4    a  B
## 5    a  B
## 6    a  B

There are many possibility to create bipartite graph from these I show two. You can choose the simplier…

First step is create an empty graph with directed=F parameter to get the same graph as above. Then create nodes with TRUE and FALSE type parameter and then add edges.

g <- graph.empty(directed = F)
node.out <- unique(edges$from) #stringsAsFactor = F in data frame
node.in <- unique(edges$to) #stringsAsFactor = F in data frame
g <- add.vertices(g,nv=length(node.out),attr=list(name=node.out),type=rep(FALSE,length(node.out)))
g <- add.vertices(g,nv=length(node.in),attr=list(name=node.in),type=rep(TRUE,length(node.in)))
edge.list.vec <- as.vector(t(as.matrix(data.frame(edges))))
g <- add.edges(g,edge.list.vec)
g
## IGRAPH UN-B 9 30 -- 
## + attr: type (v/l), name (v/c)
## + edges (vertex names):
##  [1] a--A a--A a--B a--B a--B a--B b--A b--B b--B b--C c--C c--C c--C c--C
## [15] c--D c--D c--D c--D c--D d--C d--D d--D e--C e--C e--C e--C e--C e--C
## [29] e--D e--D

The other possibility to create graph from data frame and then add nodes type

g <- graph.data.frame(edges, directed = F)
V(g)$type <- V(g)$name %in% edges[,2] #the second column of edges is TRUE type
g
## IGRAPH UN-B 9 30 -- 
## + attr: name (v/c), type (v/l)
## + edges (vertex names):
##  [1] a--A a--A a--B a--B a--B a--B b--A b--B b--B b--C c--C c--C c--C c--C
## [15] c--D c--D c--D c--D c--D d--C d--D d--D e--C e--C e--C e--C e--C e--C
## [29] e--D e--D

You can see that these graphs not real weighted according to igraph’s parameters but them have some repeated edges. But if you get adjacency matrix from graph you get the same weighted adjacency matrix as above. And other graph operations are work same as real weighted graphs.

get.incidence(g)
##   A B C D
## a 2 4 0 0
## b 1 2 1 0
## c 0 0 4 5
## d 0 0 1 2
## e 0 0 6 2

Graph from three column edge list

In this case the edge list is similar then two column case but weights are in the third column.

##    Var1 Var2 value
## 1     a    A     2
## 2     b    A     1
## 6     a    B     4
## 7     b    B     2
## 12    b    C     1
## 13    c    C     4

The process is the same as two column type but one step more because need to add weights.

g <- graph.data.frame(edges, directed = F)
V(g)$type <- V(g)$name %in% edges[,2] #the second column of edges is TRUE type
E(g)$weight <- as.numeric(edges[,3])
g
## IGRAPH UNWB 9 11 -- 
## + attr: name (v/c), type (v/l), value (e/n), weight (e/n)
## + edges (vertex names):
##  [1] a--A b--A a--B b--B b--C c--C d--C e--C c--D d--D e--D

OR if the edge data frame weight column has name exactly “weight” then {igraph} automatically read it as weights of edges.

names(edges)[3] <- "weight"
g <- graph.data.frame(edges, directed = F)
V(g)$type <- V(g)$name %in% edges[,2] #the second column of edges is TRUE type
g
## IGRAPH UNWB 9 11 -- 
## + attr: name (v/c), type (v/l), weight (e/n)
## + edges (vertex names):
##  [1] a--A b--A a--B b--B b--C c--C d--C e--C c--D d--D e--D

This results real weighted graph as you see. The adjacency matrix is same as earlier.

get.incidence(g, attr = "weight")
##   A B C D
## a 2 4 0 0
## b 1 2 1 0
## c 0 0 4 5
## d 0 0 1 2
## e 0 0 6 2

Plot graph

You are ready to plot the resulted graph.

V(g)$color <- V(g)$type
V(g)$color=gsub("FALSE","red",V(g)$color)
V(g)$color=gsub("TRUE","blue",V(g)$color)
plot(g, edge.color="gray30",edge.width=E(g)$weight, layout=layout_as_bipartite)

Be happyR!

See also:

Shizukalab: Social Networks in R

Kateto for visualization and others