February 2,2020 (updated: 2020-09-09)

Sources

  • Matrix in R tutorial by Prof. Giovanni Petris.
  • doParallel guide by Weston and Calaway available here.
  • Miscellaneous internet resources.
  • Before you run a module or chunk, make sure that you have all the packages installed.
  • We need the R package ‘Matrix’ and ‘doParallel’.

Matrices

  • To arrange values into a matrix, we use the matrix() function:
a <- matrix(1 : 6, nrow = 2, ncol = 3)
a
##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6
  • Individual entries can be referred to using a pair of indices. For example, the element in the second row, third column can be printed as:
a[2, 3]
## [1] 6

Rows, Columns

  • An entire row or column can be retrieved by specifying its index and leaving the other index empty:
a[2, ]
## [1] 2 4 6
  • Similarly to the way R works with vectors, you can use negative indices to exclude row or colums of a matrix:
a[-2,]
## [1] 1 3 5

Matrix indices

  • Note how the values 1,2,3,4,5,6 are used to create the matrix: the first two are used to fill the first column, then the next two to fill the second column, and so on.

  • R allows matrices to be indexed by a single number, e.g.,

a[5]
## [1] 5
  • The fifth element of a is the fifth element of the vector obtained by “unrolling” the matrix, column by column.

Fill by row

  • Sometimes you need to fill a matrix row by row, instead than column by column.
  • You can change the default behavior as follows:
a <- matrix(1 : 6, nrow = 2, ncol = 3, byrow = TRUE)
a
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6

Diagonals

  • We have seen rbind() and cbind() before.
  • e.g. We can use cbind() to construct a 3 x 3 Hilbert matrix. The (i, j) entry is 1/(i + j - 1).
H3 <- 1 / cbind(1 : 3, 2 : 4, 3 : 5)
H3
##           [,1]      [,2]      [,3]
## [1,] 1.0000000 0.5000000 0.3333333
## [2,] 0.5000000 0.3333333 0.2500000
## [3,] 0.3333333 0.2500000 0.2000000

Diagonals and Determinants

  • The diagonal of a matrix can be extracted using diag().
diag(H3)
## [1] 1.0000000 0.3333333 0.2000000
  • The determinant of a square matrix can be obtained with det().
det(H3)
## [1] 0.000462963

Triangular

  • The functions lower.tri() and upper.tri() can be used to obtain the lower and upper triangular parts of matrices. The output of the functions is a matrix of logical elements, with TRUE representing the relevant triangular elements. For example,
lower.tri(H3)
##       [,1]  [,2]  [,3]
## [1,] FALSE FALSE FALSE
## [2,]  TRUE FALSE FALSE
## [3,]  TRUE  TRUE FALSE

Triangular

  • A typical use of these functions is to set the upper or lower triangular part of a matrix to zero, thus constructing a triangular matrix.
Htri <- H3
Htri[lower.tri(Htri)] <- 0
Htri
##      [,1]      [,2]      [,3]
## [1,]    1 0.5000000 0.3333333
## [2,]    0 0.3333333 0.2500000
## [3,]    0 0.0000000 0.2000000

Multiplication

  • Matrix multiplication in R is performed by the operator %*%. For example,
A <- matrix(c(1, 2, 0, 1), 2)
B <- matrix(c(1, 3, 3, 1, 4, 5), nr = 2)
A
##      [,1] [,2]
## [1,]    1    0
## [2,]    2    1
B
##      [,1] [,2] [,3]
## [1,]    1    3    4
## [2,]    3    1    5
  • Matrices must conform, i.e. multiplication must make sense !

Multiplication

A %*% B
##      [,1] [,2] [,3]
## [1,]    1    3    4
## [2,]    5    7   13
dim(A)
## [1] 2 2
ncol(A)
## [1] 2
nrow(B)
## [1] 2

Multiplication

With \(A\) and \(B\) defined as above the matrix product \(BA\) in not defined, but the product \(B^T A\) is (\(B^T\) denotes the transpose of \(B\)). - The transpose of a matrix can be obtained with the function t().

t(B) %*% A
##      [,1] [,2]
## [1,]    7    3
## [2,]    5    1
## [3,]   14    5

Crossprod

  • A much more efficient way of computing the same matrix product is via the function crossprod().
all.equal(crossprod(B, A), t(B) %*% A)
## [1] TRUE
  • A similar function, tcrossprod(), computes the matrix product \(AB^T\) whenever the product is well defined.

Exercise (Try at home)

  • Consider the matrix
X <- cbind(1, seq(-1, 1, length = 11))
X
##       [,1] [,2]
##  [1,]    1 -1.0
##  [2,]    1 -0.8
##  [3,]    1 -0.6
##  [4,]    1 -0.4
##  [5,]    1 -0.2
##  [6,]    1  0.0
##  [7,]    1  0.2
##  [8,]    1  0.4
##  [9,]    1  0.6
## [10,]    1  0.8
## [11,]    1  1.0
  • Use crossprod() and tcrossprod() to find \(X^T X\) and \(X X^T\).
  • Note the dimensions of the resulting matrices.

Matrix inversion

  • The inverse of a matrix can be found using solve():
Ainv <- solve(A)
Ainv
##      [,1] [,2]
## [1,]    1    0
## [2,]   -2    1
all.equal(Ainv %*% A, diag(2))
## [1] TRUE

Solve linear system

  • Inverse in R uses QR decomposition, that we might briefly talk about at the very end (if time permits)!

  • For linear systems we don’t need to compute inverses. We can use Gaussian Elimination.

b <- c(5, 3)
solve(A, b) # use this
## [1]  5 -7
Ainv %*% b # don't use this
##      [,1]
## [1,]    5
## [2,]   -7

Time comparison

  • Compare execution times for the different methods of solving the Least Squares problem when the number of observations is large.

  • Let us start by making up our own random data: \(n \times p\) design matrix \(X\) and an \(n \times 1\) response vector \(y\).

  • Take \(n = 5 \times 10^5\).

set.seed(123)
n <- 5e5; p <- 20
X <- matrix(rnorm(n * p, mean = 1 : p, sd = 10), nr = n, nc = p, byrow = TRUE)
y <- rowSums(X) + rnorm(n)

OLS

  • The naive approach to solve the LS problem is to compute: \(\hat{\beta} = (X^T X)^{-1}X^T y\)
system.time(bHat1 <- solve(t(X) %*% X) %*% t(X) %*% y)
##    user  system elapsed 
##    0.34    0.03    0.37
  • Not so naive: use crossprod.
system.time(bHat2 <- solve(crossprod(X), crossprod(X, y)))
##    user  system elapsed 
##    0.21    0.00    0.20

More advanced topics

Logics in R

Logical (boolean) operations

Operator Operation Vectorized?
x | y or Yes
x & y and Yes
!x not Yes
x || y or No
x && y and No
xor(x,y) exclusive or Yes

Vectorized?

x = c(TRUE,FALSE,TRUE)
y = c(FALSE,TRUE,TRUE)
  x | y
## [1] TRUE TRUE TRUE
  x || y
## [1] TRUE
  x & y
## [1] FALSE FALSE  TRUE
  x && y
## [1] FALSE

Length coercion

x = c(TRUE,FALSE,TRUE)
y = c(TRUE)
z = c(FALSE,TRUE)
  x | y
## [1] TRUE TRUE TRUE
  y | z
## [1] TRUE TRUE
  x & y
## [1]  TRUE FALSE  TRUE
  y & z
## [1] FALSE  TRUE

Comparisons

Operator Comparison Vectorized?
x < y less than Yes
x > y greater than Yes
x <= y less than or equal to Yes
x >= y greater than or equal to Yes
x != y not equal to Yes
x == y equal to Yes
x %in% y contains Yes (for x)

Comparisons: Length Coercion

x = c("A","B","C")
z = c("A")
  x == z
## [1]  TRUE FALSE FALSE
  x != z
## [1] FALSE  TRUE  TRUE
  x %in% z
## [1]  TRUE FALSE FALSE
  z %in% x
## [1] TRUE

Careful with Conditionals

Conditional execution is achieved via if statements.

Note that if statements are not vectorized.

x = c(3,1)
if (3 %in% x)
  "Here!"
## [1] "Here!"
if (x >= 2)
  "Now Here!"
## Warning in if (x >= 2) "Now Here!": the condition has length > 1 and only the
## first element will be used
## [1] "Now Here!"

Collapsing logical vectors

There are a couple of helper functions for collapsing a logical vector down to a single value: any, all

x = c(3,4)
  any(x >= 2)
## [1] TRUE
  all(x >= 2)
## [1] TRUE
  if (any(x >= 2))
    print("Now There!")
## [1] "Now There!"

Error Checking

stop and stopifnot

Often we want to validate user input or function arguments - if our assumptions are not met then we often want to report the error and stop execution.

ok = FALSE
if (!ok)
  stop("Things are not ok.")
## Error in eval(expr, envir, enclos): Things are not ok.
stopifnot(ok)
## Error: ok is not TRUE

Note - an error (like the one generated by stop) will prevent an RMarkdown document from compiling unless error=TRUE is set for that code block.

More on functions

Recap

  • The two parts of a function are the arguments (formals) and the code (body).

  • It is also possible to give function arguments default values so that they don’t need to be provided every time the function is called.

  • There are two approaches to returning values: explicit and implicit return values: explicit (using return) or implicit (last statement).

  • Remember scope of a variable when using functions.

Lazy evaluation

Arguments to R functions are lazily evaluated - meaning they are not evaluated until they are used

f = function(x)
{
  cat("Hello world!\n")
  x
}

f(stop())
## Hello world!
## Error in f(stop()):

Everything is a function

`+`
## function (e1, e2)  .Primitive("+")
typeof(`+`)
## [1] "builtin"
x = 4:1
`+`(x,2)
## [1] 6 5 4 3

More matrices

Inverting a matrix using Cholesky helps

  • A symmetric positive definite matrix can be factorized as: \[ A = L L^T \]
p = 1000; A = array(rnorm(p*p), c(p,p))
A = crossprod(A)
ptm <- proc.time(); B = solve(A); proc.time()-ptm
##    user  system elapsed 
##    0.81    0.00    0.81
ptm <- proc.time(); B = chol2inv(A); proc.time()-ptm
##    user  system elapsed 
##    0.24    0.01    0.25

Storage of sparse matrices

  • Sparse matrices appear in many areas of Statistics and Machine learning.
    1. For large Markov chain, the transition matrix becomes sparse.
    2. Large design matrix / contingency tables.
    3. Modern dataset with 0 as imputation for missing values.
  • If matrix \(A_{n\times n}\) has average \(k\) non-zero entries in each row and \(k << n\), then storage reduces from \(O(n^2)\) to \(O(n)\).

Storage of unstructured sparse matrices

# Demo 2: saving sparse matrices in R can save memory
library(Matrix)
m1 = matrix(0, nrow=1000, ncol=1000)
m2 = Matrix(0, nrow=1000, ncol=1000, sparse=TRUE)
object.size(m1)
## 8000216 bytes
object.size(m2)
## 5728 bytes

Parallel

A gentle start to parallel computing

  • Parallel processing breaks up your task, splits it among multiple processors, and puts the components back together.
  • Useful and easy if you have a task that can be split up, especially without the different parts needing to “talk to” each other.
  • On your typical computer, implementing parallel processing in R might speed up your program by a factor of 2 to 4.
  • If you have access to clusters on the web, many many “cores”.

How do I know if I can apply these ideas?

  • Parallel processing works best when you have a task that has to be completed many times, but each repeat is independent.
  • For instance, if you are repreating a simulation, and in each case are drawing new parameters from a distribution, that is an easily parallelizable task.
  • A good rule of thumb is that if you can wrap your task in an apply function or one of its variants, it’s a good option for parallelization.
  • In fact most implementations of parallel processing in R are versions of apply.

What happens?

  • When you run a task in parallel, your computer “dispatches” each task to a CPU core.
  • This dispatching adds computational overhead.
  • So, it’s usually best to try to minimize the number of dispatches.
  • In most cases you are going to have a small number of computing cores relative to your tasks. A powerful laptop or desktop will have 2, 4, or 8 cores, and even the most powerful Amazon virtual machine has 88.

Introduction to doParallel

  • We will first write a super simple R code to explain the mechanism of parallel processing using doParallel.

  • Use Sys.sleep() function that “suspends execution of R expressions for a specified time interval”.

  • First, sequential for-loop:

ptm <- proc.time()
for(i in 1:4){
  Sys.sleep(1)
}
proc.time() - ptm
##    user  system elapsed 
##       0       0       4

Now let’s do it parallely

  • The main changes are “foreach” instead of “for” and a `%dopar%.

  • You will also need to tell R how many processors you want to use - you can make this automatic by using Sys.getenv() function.

n = as.numeric(Sys.getenv("NUMBER_OF_PROCESSORS")) # on Windows
require(doParallel)
cl <- makeCluster(n)
registerDoParallel(cl)
ptm <- proc.time()
foreach(i=1:4) %dopar% {Sys.sleep(1)}
proc.time()-ptm

Output of the snippet

  • Here is the output !
  • Because my laptop has 4 cores, and I am doing this parallely, the wait time is just 1 second.
n = as.numeric(Sys.getenv("NUMBER_OF_PROCESSORS")) # on Windows
require(doParallel)
cl <- makeCluster(n)
registerDoParallel(cl)
ptm <- proc.time()
res <- foreach(i=1:4) %dopar% {Sys.sleep(1)}
proc.time()-ptm
##    user  system elapsed 
##    0.03    0.00    1.08

Hello, world !

  • This is our “Hello, world” program for parallel computing.
  • It tests that everything is installed and set up properly, but don’t expect it to run faster than a sequential for loop, because it won’t!
  • With small tasks, the overhead of scheduling the task and returning the result can be greater than the time to execute the task itself, resulting in poor performance.

We will see a non-trivial example

  • Non-trivial examples need sophisticated Statistical methods, and I haven’t covered these yet. So don’t worry at all if you don’t understand any of it. I just want to show you an example of the advantage of using parallel processing.

  • How long does it take to do 10,000 Bootstrap iterations using 2 cores?

Bootstrapping GLM

## 'data.frame':    100 obs. of  2 variables:
##  $ Sepal.Length: num  7 6.4 6.9 5.5 6.5 5.7 6.3 4.9 6.6 5.2 ...
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 2 2 2 2 2 2 2 2 2 2 ...
  • Logistic Regression: predict species using sepal.length.
  • Want the distribution of parameter estimates.
  • Fit the model repeatedly (like 10k times) for observations randomly sampled with replacements.
  • Gives us Bootstrap distribution.
  • Cover Bootstrap in greater details towards the end.

Bootstrap using parallel

x <- iris[which(iris[,5] != "setosa"), c(1,5)]
trials <- 10000
ptime <- system.time({
r <- foreach(icount(trials), .combine=cbind) %dopar% {
  ind <- sample(100, 100, replace=TRUE)
  result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit))
  coefficients(result1)
}
})
ptime
##    user  system elapsed 
##    4.39    1.98   10.03

Sequential counter-part

stime <- system.time({
r <- foreach(icount(trials), .combine=cbind) %do% {
  ind <- sample(100, 100, replace=TRUE)
  result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit))
  coefficients(result1)
}
})
stime
##    user  system elapsed 
##   21.96    0.05   22.17

Exercise (Try at home)

  • Can you parallelise any of the divide-and-conquer algorithms you have learned in class?