Matrix in R

February 2,2020 (updated: 2020-09-09)

Sources

Matrix in R tutorial by Prof. Giovanni Petris.
doParallel guide by Weston and Calaway available here.
Miscellaneous internet resources.
Before you run a module or chunk, make sure that you have all the packages installed.
We need the R package ‘Matrix’ and ‘doParallel’.

Matrices

To arrange values into a matrix, we use the matrix() function:

a <- matrix(1 : 6, nrow = 2, ncol = 3)
a

##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6

Individual entries can be referred to using a pair of indices. For example, the element in the second row, third column can be printed as:

a[2, 3]

## [1] 6

Rows, Columns

An entire row or column can be retrieved by specifying its index and leaving the other index empty:

a[2, ]

## [1] 2 4 6

Similarly to the way R works with vectors, you can use negative indices to exclude row or colums of a matrix:

a[-2,]

## [1] 1 3 5

Matrix indices

Note how the values 1,2,3,4,5,6 are used to create the matrix: the first two are used to fill the first column, then the next two to fill the second column, and so on.
R allows matrices to be indexed by a single number, e.g.,

a[5]

## [1] 5

The fifth element of a is the fifth element of the vector obtained by “unrolling” the matrix, column by column.

Fill by row

Sometimes you need to fill a matrix row by row, instead than column by column.
You can change the default behavior as follows:

a <- matrix(1 : 6, nrow = 2, ncol = 3, byrow = TRUE)
a

##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6

Diagonals

We have seen rbind() and cbind() before.
e.g. We can use cbind() to construct a 3 x 3 Hilbert matrix. The (i, j) entry is 1/(i + j - 1).

H3 <- 1 / cbind(1 : 3, 2 : 4, 3 : 5)
H3

##           [,1]      [,2]      [,3]
## [1,] 1.0000000 0.5000000 0.3333333
## [2,] 0.5000000 0.3333333 0.2500000
## [3,] 0.3333333 0.2500000 0.2000000

Diagonals and Determinants

The diagonal of a matrix can be extracted using diag().

diag(H3)

## [1] 1.0000000 0.3333333 0.2000000

The determinant of a square matrix can be obtained with det().

det(H3)

## [1] 0.000462963

Triangular

The functions lower.tri() and upper.tri() can be used to obtain the lower and upper triangular parts of matrices. The output of the functions is a matrix of logical elements, with TRUE representing the relevant triangular elements. For example,

lower.tri(H3)

##       [,1]  [,2]  [,3]
## [1,] FALSE FALSE FALSE
## [2,]  TRUE FALSE FALSE
## [3,]  TRUE  TRUE FALSE

Triangular

A typical use of these functions is to set the upper or lower triangular part of a matrix to zero, thus constructing a triangular matrix.

Htri <- H3
Htri[lower.tri(Htri)] <- 0
Htri

##      [,1]      [,2]      [,3]
## [1,]    1 0.5000000 0.3333333
## [2,]    0 0.3333333 0.2500000
## [3,]    0 0.0000000 0.2000000

Multiplication

Matrix multiplication in R is performed by the operator %*%. For example,

A <- matrix(c(1, 2, 0, 1), 2)
B <- matrix(c(1, 3, 3, 1, 4, 5), nr = 2)
A

##      [,1] [,2]
## [1,]    1    0
## [2,]    2    1

##      [,1] [,2] [,3]
## [1,]    1    3    4
## [2,]    3    1    5

Matrices must conform, i.e. multiplication must make sense !

Multiplication

A %*% B

##      [,1] [,2] [,3]
## [1,]    1    3    4
## [2,]    5    7   13

dim(A)

## [1] 2 2

ncol(A)

## [1] 2

nrow(B)

## [1] 2

Multiplication

With \(A\) and \(B\) defined as above the matrix product \(BA\) in not defined, but the product \(B^T A\) is (\(B^T\) denotes the transpose of \(B\)). - The transpose of a matrix can be obtained with the function t().

t(B) %*% A

##      [,1] [,2]
## [1,]    7    3
## [2,]    5    1
## [3,]   14    5

Crossprod

A much more efficient way of computing the same matrix product is via the function crossprod().

all.equal(crossprod(B, A), t(B) %*% A)

## [1] TRUE

A similar function, tcrossprod(), computes the matrix product \(AB^T\) whenever the product is well defined.

Exercise (Try at home)

Consider the matrix

X <- cbind(1, seq(-1, 1, length = 11))
X

##       [,1] [,2]
##  [1,]    1 -1.0
##  [2,]    1 -0.8
##  [3,]    1 -0.6
##  [4,]    1 -0.4
##  [5,]    1 -0.2
##  [6,]    1  0.0
##  [7,]    1  0.2
##  [8,]    1  0.4
##  [9,]    1  0.6
## [10,]    1  0.8
## [11,]    1  1.0

Use crossprod() and tcrossprod() to find \(X^T X\) and \(X X^T\).
Note the dimensions of the resulting matrices.

Matrix inversion

The inverse of a matrix can be found using solve():

Ainv <- solve(A)
Ainv

##      [,1] [,2]
## [1,]    1    0
## [2,]   -2    1

all.equal(Ainv %*% A, diag(2))

## [1] TRUE

Solve linear system

Inverse in R uses QR decomposition, that we might briefly talk about at the very end (if time permits)!
For linear systems we don’t need to compute inverses. We can use Gaussian Elimination.

b <- c(5, 3)
solve(A, b) # use this

## [1]  5 -7

Ainv %*% b # don't use this

##      [,1]
## [1,]    5
## [2,]   -7

Time comparison

Compare execution times for the different methods of solving the Least Squares problem when the number of observations is large.
Let us start by making up our own random data: \(n \times p\) design matrix \(X\) and an \(n \times 1\) response vector \(y\).
Take \(n = 5 \times 10^5\).

set.seed(123)
n <- 5e5; p <- 20
X <- matrix(rnorm(n * p, mean = 1 : p, sd = 10), nr = n, nc = p, byrow = TRUE)
y <- rowSums(X) + rnorm(n)

OLS

The naive approach to solve the LS problem is to compute: \(\hat{\beta} = (X^T X)^{-1}X^T y\)

system.time(bHat1 <- solve(t(X) %*% X) %*% t(X) %*% y)

##    user  system elapsed 
##    0.34    0.03    0.37

Not so naive: use crossprod.

system.time(bHat2 <- solve(crossprod(X), crossprod(X, y)))

##    user  system elapsed 
##    0.21    0.00    0.20

Logics in R

Logical (boolean) operations

Operator	Operation	Vectorized?
`x \| y`	or	Yes
`x & y`	and	Yes
`!x`	not	Yes
`x \|\| y`	or	No
`x && y`	and	No
`xor(x,y)`	exclusive or	Yes

Vectorized?

x = c(TRUE,FALSE,TRUE)
y = c(FALSE,TRUE,TRUE)

  x | y

## [1] TRUE TRUE TRUE

  x || y

## [1] TRUE

  x & y

## [1] FALSE FALSE  TRUE

  x && y

## [1] FALSE

Length coercion

x = c(TRUE,FALSE,TRUE)
y = c(TRUE)
z = c(FALSE,TRUE)

  x | y

## [1] TRUE TRUE TRUE

  y | z

## [1] TRUE TRUE

  x & y

## [1]  TRUE FALSE  TRUE

  y & z

## [1] FALSE  TRUE

Comparisons

Operator	Comparison	Vectorized?
`x < y`	less than	Yes
`x > y`	greater than	Yes
`x <= y`	less than or equal to	Yes
`x >= y`	greater than or equal to	Yes
`x != y`	not equal to	Yes
`x == y`	equal to	Yes
`x %in% y`	contains	Yes (for `x`)

Comparisons: Length Coercion

x = c("A","B","C")
z = c("A")

  x == z

## [1]  TRUE FALSE FALSE

  x != z

## [1] FALSE  TRUE  TRUE

  x %in% z

## [1]  TRUE FALSE FALSE

  z %in% x

## [1] TRUE

Careful with Conditionals

Conditional execution is achieved via if statements.

Note that if statements are not vectorized.

x = c(3,1)
if (3 %in% x)
  "Here!"

## [1] "Here!"

if (x >= 2)
  "Now Here!"

## Warning in if (x >= 2) "Now Here!": the condition has length > 1 and only the
## first element will be used

## [1] "Now Here!"

Collapsing logical vectors

There are a couple of helper functions for collapsing a logical vector down to a single value: any, all

x = c(3,4)

  any(x >= 2)

## [1] TRUE

  all(x >= 2)

## [1] TRUE

  if (any(x >= 2))
    print("Now There!")

## [1] "Now There!"

Error Checking

`stop` and `stopifnot`

Often we want to validate user input or function arguments - if our assumptions are not met then we often want to report the error and stop execution.

ok = FALSE
if (!ok)
  stop("Things are not ok.")

## Error in eval(expr, envir, enclos): Things are not ok.

stopifnot(ok)

## Error: ok is not TRUE

Note - an error (like the one generated by stop) will prevent an RMarkdown document from compiling unless error=TRUE is set for that code block.

Recap

The two parts of a function are the arguments (formals) and the code (body).
It is also possible to give function arguments default values so that they don’t need to be provided every time the function is called.
There are two approaches to returning values: explicit and implicit return values: explicit (using return) or implicit (last statement).
Remember scope of a variable when using functions.

Lazy evaluation

Arguments to R functions are lazily evaluated - meaning they are not evaluated until they are used

f = function(x)
{
  cat("Hello world!\n")
  x
}

f(stop())

## Hello world!

## Error in f(stop()):

Everything is a function

`+`

## function (e1, e2)  .Primitive("+")

typeof(`+`)

## [1] "builtin"

x = 4:1
`+`(x,2)

## [1] 6 5 4 3

More matrices

Inverting a matrix using Cholesky helps

A symmetric positive definite matrix can be factorized as: \[ A = L L^T \]

p = 1000; A = array(rnorm(p*p), c(p,p))
A = crossprod(A)
ptm <- proc.time(); B = solve(A); proc.time()-ptm

##    user  system elapsed 
##    0.81    0.00    0.81

ptm <- proc.time(); B = chol2inv(A); proc.time()-ptm

##    user  system elapsed 
##    0.24    0.01    0.25

Storage of sparse matrices

Sparse matrices appear in many areas of Statistics and Machine learning.
1. For large Markov chain, the transition matrix becomes sparse.
2. Large design matrix / contingency tables.
3. Modern dataset with 0 as imputation for missing values.
If matrix \(A_{n\times n}\) has average \(k\) non-zero entries in each row and \(k << n\), then storage reduces from \(O(n^2)\) to \(O(n)\).

Storage of unstructured sparse matrices

# Demo 2: saving sparse matrices in R can save memory
library(Matrix)
m1 = matrix(0, nrow=1000, ncol=1000)
m2 = Matrix(0, nrow=1000, ncol=1000, sparse=TRUE)
object.size(m1)

## 8000216 bytes

object.size(m2)

## 5728 bytes

Parallel

A gentle start to parallel computing

Parallel processing breaks up your task, splits it among multiple processors, and puts the components back together.
Useful and easy if you have a task that can be split up, especially without the different parts needing to “talk to” each other.
On your typical computer, implementing parallel processing in R might speed up your program by a factor of 2 to 4.
If you have access to clusters on the web, many many “cores”.

How do I know if I can apply these ideas?

Parallel processing works best when you have a task that has to be completed many times, but each repeat is independent.
For instance, if you are repreating a simulation, and in each case are drawing new parameters from a distribution, that is an easily parallelizable task.
A good rule of thumb is that if you can wrap your task in an apply function or one of its variants, it’s a good option for parallelization.
In fact most implementations of parallel processing in R are versions of apply.

What happens?

When you run a task in parallel, your computer “dispatches” each task to a CPU core.
This dispatching adds computational overhead.
So, it’s usually best to try to minimize the number of dispatches.
In most cases you are going to have a small number of computing cores relative to your tasks. A powerful laptop or desktop will have 2, 4, or 8 cores, and even the most powerful Amazon virtual machine has 88.

Introduction to doParallel

We will first write a super simple R code to explain the mechanism of parallel processing using doParallel.
Use Sys.sleep() function that “suspends execution of R expressions for a specified time interval”.
First, sequential for-loop:

ptm <- proc.time()
for(i in 1:4){
  Sys.sleep(1)
}
proc.time() - ptm

##    user  system elapsed 
##       0       0       4

Now let’s do it parallely

The main changes are “foreach” instead of “for” and a `%dopar%.
You will also need to tell R how many processors you want to use - you can make this automatic by using Sys.getenv() function.

n = as.numeric(Sys.getenv("NUMBER_OF_PROCESSORS")) # on Windows
require(doParallel)
cl <- makeCluster(n)
registerDoParallel(cl)
ptm <- proc.time()
foreach(i=1:4) %dopar% {Sys.sleep(1)}
proc.time()-ptm

Output of the snippet

Here is the output !
Because my laptop has 4 cores, and I am doing this parallely, the wait time is just 1 second.

n = as.numeric(Sys.getenv("NUMBER_OF_PROCESSORS")) # on Windows
require(doParallel)
cl <- makeCluster(n)
registerDoParallel(cl)
ptm <- proc.time()
res <- foreach(i=1:4) %dopar% {Sys.sleep(1)}
proc.time()-ptm

##    user  system elapsed 
##    0.03    0.00    1.08

Hello, world !

This is our “Hello, world” program for parallel computing.
It tests that everything is installed and set up properly, but don’t expect it to run faster than a sequential for loop, because it won’t!
With small tasks, the overhead of scheduling the task and returning the result can be greater than the time to execute the task itself, resulting in poor performance.

We will see a non-trivial example

Non-trivial examples need sophisticated Statistical methods, and I haven’t covered these yet. So don’t worry at all if you don’t understand any of it. I just want to show you an example of the advantage of using parallel processing.
How long does it take to do 10,000 Bootstrap iterations using 2 cores?

Bootstrapping GLM

## 'data.frame':    100 obs. of  2 variables:
##  $ Sepal.Length: num  7 6.4 6.9 5.5 6.5 5.7 6.3 4.9 6.6 5.2 ...
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 2 2 2 2 2 2 2 2 2 2 ...

Logistic Regression: predict species using sepal.length.
Want the distribution of parameter estimates.
Fit the model repeatedly (like 10k times) for observations randomly sampled with replacements.
Gives us Bootstrap distribution.
Cover Bootstrap in greater details towards the end.

Bootstrap using parallel

x <- iris[which(iris[,5] != "setosa"), c(1,5)]
trials <- 10000
ptime <- system.time({
r <- foreach(icount(trials), .combine=cbind) %dopar% {
  ind <- sample(100, 100, replace=TRUE)
  result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit))
  coefficients(result1)
}
})
ptime

##    user  system elapsed 
##    4.39    1.98   10.03

Sequential counter-part

stime <- system.time({
r <- foreach(icount(trials), .combine=cbind) %do% {
  ind <- sample(100, 100, replace=TRUE)
  result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit))
  coefficients(result1)
}
})
stime

##    user  system elapsed 
##   21.96    0.05   22.17

Exercise (Try at home)

Can you parallelise any of the divide-and-conquer algorithms you have learned in class?

Sources

Matrices

Rows, Columns

Matrix indices

Fill by row

Diagonals

Diagonals and Determinants

Triangular

Triangular

Multiplication

Multiplication

Multiplication

Crossprod

Exercise (Try at home)

Matrix inversion

Solve linear system

Time comparison

OLS

More advanced topics

Logics in R

Logical (boolean) operations

Vectorized?

Length coercion

Comparisons

Comparisons: Length Coercion

Careful with Conditionals

Collapsing logical vectors

Error Checking

stop and stopifnot

More on functions

Recap

Lazy evaluation

Everything is a function

More matrices

Inverting a matrix using Cholesky helps

Storage of sparse matrices

Storage of unstructured sparse matrices

Parallel

A gentle start to parallel computing

How do I know if I can apply these ideas?

What happens?

Introduction to doParallel

Now let’s do it parallely

Output of the snippet

Hello, world !

We will see a non-trivial example

Bootstrapping GLM

Bootstrap using parallel

Sequential counter-part

Exercise (Try at home)

`stop` and `stopifnot`